## **Personal Expenses Data Preparation**

### **Data Loading and first look**  
Export data from the smartphone app I use to collect my expense data. The data comes in a handy CSV format, so I can easily load it into a pandas DataFrame by specifying a delimiter. I also specify other parameters: which columns to load and parsing dates from the 'date' column.

In [121]:
import pandas as pd
import numpy as np

fname = "data/report_2022-09-11_101643.csv"
# load the data
df = pd.read_csv(
    fname,
    sep=";",
    usecols=[
        "date",
        "category",
        "account",
        "ref_currency_amount",
        "payment_type_local",
        "gps_latitude",
        "gps_longitude",
        "labels"
    ],
    parse_dates=["date"],
)
df.head()


Unnamed: 0,account,category,ref_currency_amount,payment_type_local,date,gps_latitude,gps_longitude,labels
0,Hanseatic Visa,Coffee,-1.4,Credit card,2022-08-31 14:33:25,,,BIG TRIP|Thailand|Pattaya
1,Hanseatic Visa,Groceries,-5.53,Credit card,2022-08-31 14:33:25,,,BIG TRIP|Thailand|Pattaya
2,Thai Baht cash,Public transport,-0.55,Cash,2022-08-31 14:02:09,,,BIG TRIP|Thailand|Pattaya
3,Thai Baht cash,Public transport,-0.27,Cash,2022-08-31 12:21:11,,,BIG TRIP|Thailand|Pattaya
4,Thai Baht cash,Coffee,-3.3,Cash,2022-08-31 11:12:21,,,Pattaya|Thailand|BIG TRIP


In [122]:
# slightly adjust the column names to something more meaningful to me and change the order.
df.columns = [
    "account",
    "category",
    "amount",
    "payment_type",
    "date",
    "lat",
    "lng",
    "labels",
]
df.head()


Unnamed: 0,account,category,amount,payment_type,date,lat,lng,labels
0,Hanseatic Visa,Coffee,-1.4,Credit card,2022-08-31 14:33:25,,,BIG TRIP|Thailand|Pattaya
1,Hanseatic Visa,Groceries,-5.53,Credit card,2022-08-31 14:33:25,,,BIG TRIP|Thailand|Pattaya
2,Thai Baht cash,Public transport,-0.55,Cash,2022-08-31 14:02:09,,,BIG TRIP|Thailand|Pattaya
3,Thai Baht cash,Public transport,-0.27,Cash,2022-08-31 12:21:11,,,BIG TRIP|Thailand|Pattaya
4,Thai Baht cash,Coffee,-3.3,Cash,2022-08-31 11:12:21,,,Pattaya|Thailand|BIG TRIP


### **Data Cleaning and Preparation**

#### **Handling Missing Data**

In [123]:
# check summary of each column
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2264 entries, 0 to 2263
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   account       2264 non-null   object        
 1   category      2264 non-null   object        
 2   amount        2264 non-null   float64       
 3   payment_type  2264 non-null   object        
 4   date          2264 non-null   datetime64[ns]
 5   lat           0 non-null      float64       
 6   lng           0 non-null      float64       
 7   labels        2178 non-null   object        
dtypes: datetime64[ns](1), float64(3), object(4)
memory usage: 141.6+ KB


Here I see that the 'lat' and 'lng' geodata columns do not contain any values. So far this is fine, I will get the place names from the 'labels' column and use the geopy library to get the relevant data about the places I have visited during my travels.

#### **Data Transformation**

In [124]:
# check and if necessary remove duplicates
df.duplicated()


0       False
1       False
2       False
3       False
4       False
        ...  
2259    False
2260    False
2261    False
2262    False
2263    False
Length: 2264, dtype: bool

In this step, I add the category names. The exported data set doesn't contain this data, so I copied it manually from the application and created a dictionary(***category : subcategory***)
After that, I will map the category value to each row based on the subcategory using pandas **map()**.

In [125]:
# create a dictionary with categories as keys and subcategories as values
# define the nature of expenses -- need or want

d = {
    "Food and Drinks": [
        "Food & Drinks",
        "Bar, cafe",
        "Groceries",
        "Restaurant, fast-food",
        "Fitness Supplements",
        "Coffee"
    ],
    "Shopping": [
        "Shopping",
        "Clothes & shoes",
        "Drug-store, chemist",
        "Electronics, accessories",
        "Camera expenses",
        "Free time",
        "Gifts, joy",
        "Health and beauty",
        "Home, garden",
        "Jewels, accessories",
        "Stationery, tools",
    ],
    "Housing": ["Housing", "Energy, utilities", "Maintenance, repairs", "Rent"],
    "Transportation": [
        "Transportation",
        "Business trips",
        "Long distance",
        "Public transport",
        "Taxi",
    ],
    "Vehicle": [
        "Vehicle",
        "Fuel",
        "Leasing",
        "Parking",
        "Rentals",
        "Vehicle insurance",
        "Vehicle maintenance",
    ],
    "Life and Entertainment": [
        "Life & Entertainment",
        "Active sport, fitness",
        "Alcohol, tobacco",
        "Books, audio, subscriptions",
        "Charity, gifts",
        "Culture, sport events",
        "Education, development",
        "Health care, doctor",
        "Hobbies",
        "Holiday, trips, hotels",
        "Sightseeing, activities",
        "Accommodation",
        "Life events",
        "Lottery, gambling",
        "TV, Streaming",
        "Wellness, beauty",
    ],
    "Communication and PC": [
        "Communication, PC",
        "Internet",
        "Phone, mobile phone",
        "Postal services",
        "Software, apps, games",
        "Phone, cell phone",
    ],
    "Financial Expenses": [
        "Financial expenses",
        "Advisory",
        "Charges, Fees",
        "Fines",
        "Insurances",
        "Loan, interests",
        "Taxes",
    ],
    "Investments": [
        "Investments",
        "Financial investments",
        "Collections",
        "Realty",
        "Savings",
        "Vehicles, chattels",
    ],
    "Income": ["Income", "Gifts", "Refunds (tax, purchase)", "Sale", "Wage, invoices"],
    "Other": ["Missing", "Other"],
}

d_nat = {
    "need": [
        "Food & Drinks",
        "Groceries",
        "Restaurant, fast-food",
        "Clothes & shoes",
        "Drug-store, chemist",
        "Home, garden",
        "Energy, utilities",
        "Maintenance, repairs",
        "Rent",
        "Transportation",
        "Long distance",
        "Public transport",
        "Taxi",
        "Internet",
        "Phone, mobile phone",
        "Postal services",
        "Phone, cell phone",
        "Charges, Fees",
        "Fines",
        "Insurances",
        "Loan, interests",
        "Taxes",
        "Other",
        "Missing",
        "Housing",
        'Financial expenses',
        'Communication, PC'
    ],

    'want': [
        "Bar, cafe",
        "Fitness Supplements",
        "Coffee",
        "Shopping",
        "Electronics, accessories",
        "Camera expenses",
        "Free time",
        "Gifts, joy",
        "Health and beauty",
        "Jewels, accessories",
        "Stationery, tools",
        "Business trips",
        "Vehicle",
        "Fuel",
        "Leasing",
        "Parking",
        "Rentals",
        "Vehicle insurance",
        "Vehicle maintenance",
        "Life & Entertainment",
        "Active sport, fitness",
        "Alcohol, tobacco",
        "Books, audio, subscriptions",
        "Charity, gifts",
        "Culture, sport events",
        "Education, development",
        "Health care, doctor",
        "Hobbies",
        "Holiday, trips, hotels",
        "Sightseeing, activities",
        "Accommodation",
        "Life events",
        "Lottery, gambling",
        "TV, Streaming",
        "Wellness, beauty",
        "Software, apps, games",
        "Advisory",
        'Shopping'
    ]
}

In [126]:
# the dictionary needs to be flatten before using the map function
def flatten_dict(d):
    nd = {}
    for k, v in d.items():
        # Check if it's a list, if so then iterate through
        if hasattr(v, "__iter__") and not isinstance(v, str):
            for item in v:
                nd[item] = k
        else:
            nd[v] = k
    return nd


In [127]:
# use the new function to flatten the dict
flatten_d = flatten_dict(d)
flatten_d_nat = flatten_dict(d_nat)


In [128]:
# change the column name of category column to subcategory
df = df.rename(columns={'category' : 'subcategory'})

In [129]:
# and finally map using the pandas map() function to assign the values
df["category"] = df["subcategory"].map(flatten_d)
df['nature'] = df['subcategory'].map(flatten_d_nat)
df.head()


Unnamed: 0,account,subcategory,amount,payment_type,date,lat,lng,labels,category,nature
0,Hanseatic Visa,Coffee,-1.4,Credit card,2022-08-31 14:33:25,,,BIG TRIP|Thailand|Pattaya,Food and Drinks,want
1,Hanseatic Visa,Groceries,-5.53,Credit card,2022-08-31 14:33:25,,,BIG TRIP|Thailand|Pattaya,Food and Drinks,need
2,Thai Baht cash,Public transport,-0.55,Cash,2022-08-31 14:02:09,,,BIG TRIP|Thailand|Pattaya,Transportation,need
3,Thai Baht cash,Public transport,-0.27,Cash,2022-08-31 12:21:11,,,BIG TRIP|Thailand|Pattaya,Transportation,need
4,Thai Baht cash,Coffee,-3.3,Cash,2022-08-31 11:12:21,,,Pattaya|Thailand|BIG TRIP,Food and Drinks,want


In [130]:
# rearrange the column order
df = df[
    [
        "date",
        "category",
        "subcategory",
        'nature',
        "amount",
        "account",
        "payment_type",
        "lat",
        "lng",
        "labels",
    ]
]
df.head()


Unnamed: 0,date,category,subcategory,nature,amount,account,payment_type,lat,lng,labels
0,2022-08-31 14:33:25,Food and Drinks,Coffee,want,-1.4,Hanseatic Visa,Credit card,,,BIG TRIP|Thailand|Pattaya
1,2022-08-31 14:33:25,Food and Drinks,Groceries,need,-5.53,Hanseatic Visa,Credit card,,,BIG TRIP|Thailand|Pattaya
2,2022-08-31 14:02:09,Transportation,Public transport,need,-0.55,Thai Baht cash,Cash,,,BIG TRIP|Thailand|Pattaya
3,2022-08-31 12:21:11,Transportation,Public transport,need,-0.27,Thai Baht cash,Cash,,,BIG TRIP|Thailand|Pattaya
4,2022-08-31 11:12:21,Food and Drinks,Coffee,want,-3.3,Thai Baht cash,Cash,,,Pattaya|Thailand|BIG TRIP


In [131]:
# convert the amount column to absolute value
df["amount"] = df["amount"].abs()
df.head()


Unnamed: 0,date,category,subcategory,nature,amount,account,payment_type,lat,lng,labels
0,2022-08-31 14:33:25,Food and Drinks,Coffee,want,1.4,Hanseatic Visa,Credit card,,,BIG TRIP|Thailand|Pattaya
1,2022-08-31 14:33:25,Food and Drinks,Groceries,need,5.53,Hanseatic Visa,Credit card,,,BIG TRIP|Thailand|Pattaya
2,2022-08-31 14:02:09,Transportation,Public transport,need,0.55,Thai Baht cash,Cash,,,BIG TRIP|Thailand|Pattaya
3,2022-08-31 12:21:11,Transportation,Public transport,need,0.27,Thai Baht cash,Cash,,,BIG TRIP|Thailand|Pattaya
4,2022-08-31 11:12:21,Food and Drinks,Coffee,want,3.3,Thai Baht cash,Cash,,,Pattaya|Thailand|BIG TRIP


Split the date column into year, month, day, day name and time columns to make the subsetting easier later on.

In [132]:
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['weekday'] = df['date'].dt.day_name()
df['time'] = df['date'].dt.time
df.head()

Unnamed: 0,date,category,subcategory,nature,amount,account,payment_type,lat,lng,labels,year,month,day,weekday,time
0,2022-08-31 14:33:25,Food and Drinks,Coffee,want,1.4,Hanseatic Visa,Credit card,,,BIG TRIP|Thailand|Pattaya,2022,8,31,Wednesday,14:33:25
1,2022-08-31 14:33:25,Food and Drinks,Groceries,need,5.53,Hanseatic Visa,Credit card,,,BIG TRIP|Thailand|Pattaya,2022,8,31,Wednesday,14:33:25
2,2022-08-31 14:02:09,Transportation,Public transport,need,0.55,Thai Baht cash,Cash,,,BIG TRIP|Thailand|Pattaya,2022,8,31,Wednesday,14:02:09
3,2022-08-31 12:21:11,Transportation,Public transport,need,0.27,Thai Baht cash,Cash,,,BIG TRIP|Thailand|Pattaya,2022,8,31,Wednesday,12:21:11
4,2022-08-31 11:12:21,Food and Drinks,Coffee,want,3.3,Thai Baht cash,Cash,,,Pattaya|Thailand|BIG TRIP,2022,8,31,Wednesday,11:12:21


##### **Create a subset of the DataFrame containing expenses while travelling**

In [133]:
# create a df subset with data while travelling
cols = list(df.columns)
dftravel = df.loc[df["date"] > "2021-10-02T00:00:00", cols].reset_index(drop=True)
dftravel.head()


Unnamed: 0,date,category,subcategory,nature,amount,account,payment_type,lat,lng,labels,year,month,day,weekday,time
0,2022-08-31 14:33:25,Food and Drinks,Coffee,want,1.4,Hanseatic Visa,Credit card,,,BIG TRIP|Thailand|Pattaya,2022,8,31,Wednesday,14:33:25
1,2022-08-31 14:33:25,Food and Drinks,Groceries,need,5.53,Hanseatic Visa,Credit card,,,BIG TRIP|Thailand|Pattaya,2022,8,31,Wednesday,14:33:25
2,2022-08-31 14:02:09,Transportation,Public transport,need,0.55,Thai Baht cash,Cash,,,BIG TRIP|Thailand|Pattaya,2022,8,31,Wednesday,14:02:09
3,2022-08-31 12:21:11,Transportation,Public transport,need,0.27,Thai Baht cash,Cash,,,BIG TRIP|Thailand|Pattaya,2022,8,31,Wednesday,12:21:11
4,2022-08-31 11:12:21,Food and Drinks,Coffee,want,3.3,Thai Baht cash,Cash,,,Pattaya|Thailand|BIG TRIP,2022,8,31,Wednesday,11:12:21


In [134]:
# exclude the deposit records as they don't count as expenses
# filter out the records
dftravel = dftravel[
    ~(
        (dftravel["category"] == "Financial_expenses")
        & (dftravel["subcategory"] == "Loan, interests")
    )
]


###### **Split the *Labels* column to 3 columns as it contains multiple values**.

In [135]:
dftravel[["l1", "l2", "l3", "l4"]] = dftravel["labels"].str.rsplit("|", expand=True)
dftravel[["l1", "l3", "l3", "l4"]]


Unnamed: 0,l1,l3,l3.1,l4
0,BIG TRIP,Pattaya,Pattaya,
1,BIG TRIP,Pattaya,Pattaya,
2,BIG TRIP,Pattaya,Pattaya,
3,BIG TRIP,Pattaya,Pattaya,
4,Pattaya,BIG TRIP,BIG TRIP,
...,...,...,...,...
2108,BIG TRIP,Phuket,Phuket,
2109,BIG TRIP,Accommodation,Accommodation,Phuket
2110,Thailand,,,
2111,Thailand,,,


The values are mixed across these 4 label columns. I convert these Series to lists to bring the values in correct place. 

In [136]:
# save the the splitted columns to lists to iterate and change the values
list_1 = dftravel["l1"].to_list()
list_2 = dftravel["l2"].to_list()
list_3 = dftravel["l3"].to_list()
list_4 = dftravel["l4"].to_list()


In [137]:

places = list(dftravel["l3"].unique())  # get unique values (These are the place names)
del_place = [1, 4, 6, 7, 19] # create a list with invalid names or NaN values
places_1 = np.delete(places, del_place).tolist() # remove and using numpy and convert back to list


In [138]:
# iterate through list_3 -- there are the majority of correct values.
# Iterate through it and if the value is not in the list with correct places
# look in other columns and append to a new list
nvalid = ("BIG TRIP", "Thailand")
place = []
for x in list_3:
    if x in places_1:
        place.append(x)
    elif x in nvalid and list_2[list_1.index(x)] in nvalid:
        place.append(list_1[list_3.index(x)])
    elif x in nvalid and list_1[list_3.index(x)] in nvalid:
        place.append(list_2[list_1.index(x)])
    elif x == "Accomodation":
        x = list_4[list_1.index(x)]
        place.append(x)
    else:
        place.append(x)


In [139]:
# append the new list to the dataframe
dftravel["place"] = place


In [140]:
# fill na in place column with ffill method (forward fill)
dftravel["place"].fillna(method="ffill", inplace=True)
dftravel.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 2113 entries, 0 to 2112
Data columns (total 20 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   date          2113 non-null   datetime64[ns]
 1   category      2113 non-null   object        
 2   subcategory   2113 non-null   object        
 3   nature        2113 non-null   object        
 4   amount        2113 non-null   float64       
 5   account       2113 non-null   object        
 6   payment_type  2113 non-null   object        
 7   lat           0 non-null      float64       
 8   lng           0 non-null      float64       
 9   labels        2088 non-null   object        
 10  year          2113 non-null   int64         
 11  month         2113 non-null   int64         
 12  day           2113 non-null   int64         
 13  weekday       2113 non-null   object        
 14  time          2113 non-null   object        
 15  l1            2088 non-null   object  

In [141]:
# change values that were not correctly filled in previous step
dftravel.loc[dftravel["place"] == "Accommodation", ["place"]] = "Phuket"
dftravel.loc[dftravel["place"] == "Road trip", ["place"]] = "Sangkhlaburi"
dftravel.loc[dftravel["place"] == "BIG TRIP", ["place"]] = "Bangkok"


In [142]:
# create a new column 'country'
dftravel["country"] = "Thailand"


In [143]:
# finally drop not needed columns
dftravel.drop(["labels", "l1", "l2", "l3", "l4"], axis=1, inplace=True)


In [144]:
# check summary for each column to spot possible issues
dftravel.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 2113 entries, 0 to 2112
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   date          2113 non-null   datetime64[ns]
 1   category      2113 non-null   object        
 2   subcategory   2113 non-null   object        
 3   nature        2113 non-null   object        
 4   amount        2113 non-null   float64       
 5   account       2113 non-null   object        
 6   payment_type  2113 non-null   object        
 7   lat           0 non-null      float64       
 8   lng           0 non-null      float64       
 9   year          2113 non-null   int64         
 10  month         2113 non-null   int64         
 11  day           2113 non-null   int64         
 12  weekday       2113 non-null   object        
 13  time          2113 non-null   object        
 14  place         2113 non-null   object        
 15  country       2113 non-null   object  

#### **Get latitude and longitude for the places**

In [145]:
import urllib.request, urllib.parse, urllib.error
import json
import ssl

api_key = False
# If you have a Google Places API key, enter it here
# api_key = 'AIzaSy___IDByT70'
# https://developers.google.com/maps/documentation/geocoding/intro

if api_key is False:
    api_key = 42
    serviceurl = 'http://py4e-data.dr-chuck.net/json?'
else :
    serviceurl = 'https://maps.googleapis.com/maps/api/geocode/json?'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

keys = dftravel["place"].unique()
geodata = list()

for place in keys:
    parms = dict()
    parms['address'] = place
    
    if api_key is not False: parms['key'] = api_key
    url = serviceurl + urllib.parse.urlencode(parms)

    print('Retrieving', url)
    uh = urllib.request.urlopen(url, context=ctx)
    data = uh.read().decode()
    print('Retrieved', len(data), 'characters')

    try:
        js = json.loads(data)
    except:
        js = None

    if not js or 'status' not in js or js['status'] != 'OK':
        print('==== Failure To Retrieve ====')
        print(data)
        continue
    
        print(json.dumps(js, indent=4))

    lat = js['results'][0]['geometry']['location']['lat']
    lng = js['results'][0]['geometry']['location']['lng']
    geodata.append([lat,lng])
    print('lat', lat, 'lng', lng)
    location = js['results'][0]['formatted_address']
    print(location)

Retrieving http://py4e-data.dr-chuck.net/json?address=Pattaya&key=42
Retrieved 1938 characters
lat 12.9235557 lng 100.8824551
Pattaya City, Bang Lamung District, Chon Buri 20150, Thailand
Retrieving http://py4e-data.dr-chuck.net/json?address=Bangkok&key=42
Retrieved 1526 characters
lat 13.7563309 lng 100.5017651
Bangkok, Thailand
Retrieving http://py4e-data.dr-chuck.net/json?address=Chiang+Mai&key=42
Retrieved 1840 characters
lat 18.7883439 lng 98.98530079999999
Chiang Mai, Mueang Chiang Mai District, Chiang Mai, Thailand
Retrieving http://py4e-data.dr-chuck.net/json?address=Koh+Chang&key=42
Retrieved 1597 characters
lat 12.0479159 lng 102.3234816
Ko Chang District, Trat, Thailand
Retrieving http://py4e-data.dr-chuck.net/json?address=Koh+Kud&key=42
Retrieved 3564 characters
lat 11.6680759 lng 102.5642261
Koh Kood, Ko Kut, Ko Kut District, Trat, Thailand
Retrieving http://py4e-data.dr-chuck.net/json?address=Ratchaburi&key=42
Retrieved 1393 characters
lat 13.5282893 lng 99.8134211
Ratcha

In [146]:
# use the zip function to make a dict from two lists
geo_dict = dict(zip(keys, geodata))
geo_dict


{'Pattaya': [12.9235557, 100.8824551],
 'Bangkok': [13.7563309, 100.5017651],
 'Chiang Mai': [18.7883439, 98.98530079999999],
 'Koh Chang': [12.0479159, 102.3234816],
 'Koh Kud': [11.6680759, 102.5642261],
 'Ratchaburi': [13.5282893, 99.8134211],
 'Hua Hin': [12.5683747, 99.9576888],
 'Khao Yai': [14.4391554, 101.3722299],
 'Ko Larn': [12.9182259, 100.7802624],
 'Sangkhlaburi': [15.1542081, 98.45306579999999],
 'Kanchanaburi': [14.1011393, 99.4179431],
 'Suratthani': [9.134194899999999, 99.3334198],
 'Khao Sok': [8.9873143, 98.6294329],
 'Krabi': [8.0854803, 98.9062856],
 'Phuket': [7.8804479, 98.3922504]}

In [148]:
# and finally map the dict values to the dataframe
dftravel["gdata"] = dftravel["place"].map(geo_dict)
dftravel.head()


Unnamed: 0,date,category,subcategory,nature,amount,account,payment_type,lat,lng,year,month,day,weekday,time,place,country,gdata
0,2022-08-31 14:33:25,Food and Drinks,Coffee,want,1.4,Hanseatic Visa,Credit card,,,2022,8,31,Wednesday,14:33:25,Pattaya,Thailand,"[12.9235557, 100.8824551]"
1,2022-08-31 14:33:25,Food and Drinks,Groceries,need,5.53,Hanseatic Visa,Credit card,,,2022,8,31,Wednesday,14:33:25,Pattaya,Thailand,"[12.9235557, 100.8824551]"
2,2022-08-31 14:02:09,Transportation,Public transport,need,0.55,Thai Baht cash,Cash,,,2022,8,31,Wednesday,14:02:09,Pattaya,Thailand,"[12.9235557, 100.8824551]"
3,2022-08-31 12:21:11,Transportation,Public transport,need,0.27,Thai Baht cash,Cash,,,2022,8,31,Wednesday,12:21:11,Pattaya,Thailand,"[12.9235557, 100.8824551]"
4,2022-08-31 11:12:21,Food and Drinks,Coffee,want,3.3,Thai Baht cash,Cash,,,2022,8,31,Wednesday,11:12:21,Pattaya,Thailand,"[12.9235557, 100.8824551]"


In [149]:
# latitude and longitude are stored in one column, I split the column to two columns
dftravel[["lat", "lng"]] = pd.DataFrame(dftravel.gdata.to_list(), index=dftravel.index)
dftravel.drop("gdata", axis=1, inplace=True)
print(dftravel.dtypes)

date            datetime64[ns]
category                object
subcategory             object
nature                  object
amount                 float64
account                 object
payment_type            object
lat                    float64
lng                    float64
year                     int64
month                    int64
day                      int64
weekday                 object
time                    object
place                   object
country                 object
dtype: object


In [150]:
# adjust the column order
col_names = dftravel.columns.values.tolist()
col_order = ['date',
             'year',
             'month',
             'day',
             'weekday',
             'time',
             'category',
             'subcategory',
             'nature',
             'amount',
             'account',
             'payment_type',
             'lat',
             'lng']

dftravel = dftravel.reindex(columns=col_order)
dftravel.head()


Unnamed: 0,date,year,month,day,weekday,time,category,subcategory,nature,amount,account,payment_type,lat,lng
0,2022-08-31 14:33:25,2022,8,31,Wednesday,14:33:25,Food and Drinks,Coffee,want,1.4,Hanseatic Visa,Credit card,12.923556,100.882455
1,2022-08-31 14:33:25,2022,8,31,Wednesday,14:33:25,Food and Drinks,Groceries,need,5.53,Hanseatic Visa,Credit card,12.923556,100.882455
2,2022-08-31 14:02:09,2022,8,31,Wednesday,14:02:09,Transportation,Public transport,need,0.55,Thai Baht cash,Cash,12.923556,100.882455
3,2022-08-31 12:21:11,2022,8,31,Wednesday,12:21:11,Transportation,Public transport,need,0.27,Thai Baht cash,Cash,12.923556,100.882455
4,2022-08-31 11:12:21,2022,8,31,Wednesday,11:12:21,Food and Drinks,Coffee,want,3.3,Thai Baht cash,Cash,12.923556,100.882455


#### **Write the data to CSV**

In [151]:
dftravel.to_csv("data/2022-09-27_travel_expenses.csv", index=False)


#### **Write the data to a SQLite database file.**

In [152]:
# write data to a SQLite database file
import sqlite3 as sq

sql_data = "data/EXPENSES.db"
conn = sq.connect(sql_data)
cur = conn.cursor()
cur.execute("""DROP TABLE IF EXISTS travel_expenses""")
dftravel.to_sql("dftravel", conn, if_exists="replace", index=False)
conn.commit()
conn.close()
