# Enriching GSB23 Survey Data with Geospatial Mapping & Multilingual Labels

In this notebook, we build on our combined 2023 survey dataset to:

1. **Map municipality responses**  
   - Load municipal boundaries from `municipalities.json`  
   - Match each respondent’s commune ID to its GeoJSON feature  
   - Embed survey answers into the GeoJSON properties  
   - Export `commune_responses.csv` for downstream mapping

2. **Generate human-readable translations**  
   - Clean question labels by removing leading numbers  
   - Use Facebook’s mBART model to translate German labels into English  
   - Cascade translations from English → French → Italian  
   - Mark Romanian translations as “not available” for now

3. **Export the final enriched dataset**  
   - Save the fully geocoded, multilingual DataFrame to `data/combined_df.csv`  

With these steps, you’ll have a ready-to-use file for both geospatial visualization and multilingual reporting.  


In [76]:
import pandas as pd
import json

In [None]:
# Load the data
df_codebook_23 = pd.read_excel("data/Extraction CodeBook - 3. Cleaned.xlsx", sheet_name="2023")
df_gsb_23 = pd.read_excel("data/GSB 2023_V1.xlsx")
df_qg = pd.read_excel("data/QuestionGlobales.xlsx")

In [78]:
df_qg.head(10)

Unnamed: 0,label,text_de,text_fr,text_it,text_ro,text_en,category_label,category_text_de,category_text_fr,category_text_it,category_text_ro,category_text_en,options_value,options_label
0,gemid,Gemeindeidentifikation (BFS),Identification de la commune (OFS),Identificazione del comune (BFS),,Municipality identification (BFS),,,,,,,,
1,gemidname,Gemeindename,Nom de la commune,Nome del comune,,Municipality name,,,,,,,,
2,hist,Historische Gemeindenummer (BFS),Numéro historique de la commune (OFS),Numero storico del comune (BFS),,Historical municipality number (BFS),,,,,,,,
3,kant,Kantonszugehörigkeit Gemeinden,Appartenance cantonale Communes,Affiliazione cantonale Comuni,,Cantonal affiliation Communes,Canton,Canton,Canton,Canton,Canton,Canton,1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;1...,ZH;BE;LU;UR;SZ;OW;NW;GL;ZG;FR;SO;BS;BL;SH;AR;A...
4,pop,Einwohnerzahl BFS,Nombre d'habitants OFS,Popolazione BFS,,Population BFS,,,,,,,,
5,spr,Sprachgebiete der Schweiz,Régions linguistiques de la Suisse,Aree linguistiche della Svizzera,,Linguistic areas of Switzerland,Sprache,Sprache,Langue,Lingua,,Language,1;2;3;4,de;fr;it;ro
6,bez,Bezirksnummer,Numéro de district,Numero del distretto,,District number,,,,,,,,
7,bezname,Bezirksname,Nom du district,Nome del distretto,,District name,,,,,,,,


In [79]:
df_codebook_23.head(50)

Unnamed: 0,code_original,code,enquete,num_question,cat_reponse,year,label,type,format,text_de,text_fr,text_it,text_ro,text_en
0,GSB23_Teilnah- me,GSB23_Teilnah- me,GSB23,Teilnah- me,,2023,Hat die Gemeinde an der Gemeindeschreiberbefra...,discrete,numeric-40.0,,,,,
1,GSB23_Mode,GSB23_Mode,GSB23,Mode,,2023,Teilnahme-Modus,discrete,numeric-40.0,,,,,
2,GSB23_StartDate,GSB23_StartDate,GSB23,StartDate,,2023,Start Date,discrete,character-20,,,,,
3,GSB23_EndDate,GSB23_EndDate,GSB23,EndDate,,2023,End Date,discrete,character-20,,,,,
4,GSB23_Progress,GSB23_Progress,GSB23,Progress,,2023,Progress,continuous,numeric-40.2,,,,,
5,GSB23_Duration,GSB23_Duration,GSB23,Duration,,2023,Duration (in seconds),continuous,numeric-40.2,,,,,
6,GSB23_Finished,GSB23_Finished,GSB23,Finished,,2023,Finished,discrete,numeric-40.0,,,,,
7,GSB23_Recorded,GSB23_Recorded,GSB23,Recorded,,2023,Recorded Date,discrete,character-20,,,,,
8,GSB23_UserLang,GSB23_UserLang,GSB23,UserLang,,2023,User Language,discrete,character-200,,,,,
9,GSB23_Q99,GSB23_Q99,GSB23,Q99,,2023,In welchem Kanton liegt Ihre Gemeinde?,continuous,numeric-40.0,,,,,


In [80]:
df_gsb_23.head()

Unnamed: 0,BFS_2023,Gemeinde_2023,GSB23_Teilnahme,GSB23_Mode,GSB23_StartDate,GSB23_EndDate,GSB23_Progress,GSB23_Duration__in_seconds_,GSB23_Finished,GSB23_RecordedDate,...,gexesvpat,gexespsat,gexeandat,gexeploat,stfussv,orgpart,autonomie,zamitkt,wahlverf,legislative
0,1,Aeugst am Albis,1,0.0,2023-02-16 08:34:08,2023-02-16 10:16:09,100.0,6121.0,1.0,2023-02-16 10:16:10,...,0.0,0.0,66.666667,0.0,1.0,1.0,7.0,3.0,1.0,1.0
1,2,Affoltern am Albis,1,0.0,2023-02-16 08:50:23,2023-02-17 10:17:44,100.0,91641.0,1.0,2023-02-17 10:17:45,...,12.5,12.5,50.0,12.5,2.0,1.0,5.0,3.0,1.0,1.0
2,3,Bonstetten,1,0.0,2023-03-30 10:16:48,2023-03-30 11:49:16,100.0,5547.0,1.0,2023-03-30 11:49:17,...,16.666667,0.0,0.0,66.666667,2.0,1.0,5.0,2.0,1.0,1.0
3,4,Hausen am Albis,1,0.0,2023-02-28 10:27:13,2023-02-28 11:45:36,100.0,4702.0,1.0,2023-02-28 11:45:37,...,14.285714,14.285714,0.0,42.857143,2.0,1.0,6.0,2.0,1.0,1.0
4,5,Hedingen,1,0.0,2023-02-20 14:13:19,2023-02-20 14:34:49,40.0,1289.0,0.0,NaT,...,,,,,3.0,,,,,


In [81]:
print(df_gsb_23.columns)

Index(['BFS_2023', 'Gemeinde_2023', 'GSB23_Teilnahme', 'GSB23_Mode',
       'GSB23_StartDate', 'GSB23_EndDate', 'GSB23_Progress',
       'GSB23_Duration__in_seconds_', 'GSB23_Finished', 'GSB23_RecordedDate',
       ...
       'gexesvpat', 'gexespsat', 'gexeandat', 'gexeploat', 'stfussv',
       'orgpart', 'autonomie', 'zamitkt', 'wahlverf', 'legislative'],
      dtype='object', length=428)


In [82]:
df_gsb_23.head(4)
df_gsb_23[df_gsb_23["BFS_2023"] == 5586]

Unnamed: 0,BFS_2023,Gemeinde_2023,GSB23_Teilnahme,GSB23_Mode,GSB23_StartDate,GSB23_EndDate,GSB23_Progress,GSB23_Duration__in_seconds_,GSB23_Finished,GSB23_RecordedDate,...,gexesvpat,gexespsat,gexeandat,gexeploat,stfussv,orgpart,autonomie,zamitkt,wahlverf,legislative
1696,5586,Lausanne,1,0.0,2023-02-23 17:34:33,2023-02-27 17:21:54,100.0,344840.0,1.0,2023-02-27 17:21:54,...,0.0,42.857143,42.857143,0.0,3.0,1.0,3.0,2.0,1.0,2.0


In [83]:
df_gsb_23.shape

(2136, 428)

data merging 

In [84]:
df_base_questions = df_codebook_23[df_codebook_23['code'].str.contains(r'GSB23_Q\d+\b')]
base_question_ids = df_base_questions['code'].str.extract(r'GSB23_Q(\d+)')[0].astype(int).unique()
df_suffix_questions = df_codebook_23[df_codebook_23['code'].str.contains(r'GSB23_Q\d+_1\b')]
for question_id in base_question_ids:
    df_suffix_questions = df_suffix_questions[~df_suffix_questions['code'].str.contains(f'GSB23_Q{question_id}_1\b')]
df_first_subquestions = pd.concat([df_base_questions, df_suffix_questions], ignore_index=True)
df_first_subquestions.reset_index(drop=True, inplace=True)

df_first_subquestions[df_first_subquestions['num_question'] == 'Q1']

Unnamed: 0,code_original,code,enquete,num_question,cat_reponse,year,label,type,format,text_de,text_fr,text_it,text_ro,text_en
41,GSB23_Q1_1,GSB23_Q1_1,GSB23,Q1,1,2023,1. Gemeinden können in unterschiedlichem Masse...,continuous,numeric-40.0,,,,,


In [85]:
df_first_subquestions[df_first_subquestions["enquete"] == "GSB23"]

Unnamed: 0,code_original,code,enquete,num_question,cat_reponse,year,label,type,format,text_de,text_fr,text_it,text_ro,text_en
0,GSB23_Q99,GSB23_Q99,GSB23,Q99,,2023,In welchem Kanton liegt Ihre Gemeinde?,continuous,numeric-40.0,,,,,
1,GSB23_Q100,GSB23_Q100,GSB23,Q100,,2023,In welcher Gemeinde sind Sie tätig? (falls Sie...,continuous,numeric-40.0,,,,,
2,GSB23_Q101,GSB23_Q101,GSB23,Q101,,2023,"Falls Sie Ihre Gemeinde nicht gefunden haben, ...",discrete,character-200,,,,,
3,GSB23_Q10,GSB23_Q10,GSB23,Q10,,2023,1. Gemeinden können in unterschiedlichem Masse...,continuous,numeric-40.0,,,,,
4,GSB23_Q4,GSB23_Q4,GSB23,Q4,,2023,4. Wie hat sich der Steuerfuss 2023 (die Steue...,continuous,numeric-40.0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80,GSB23_Q51_1,GSB23_Q51_1,GSB23,Q51,1,2023,51. Welche Führungsaufgaben hat der/ die Gemei...,continuous,numeric-40.0,,,,,
81,GSB23_Q53_1,GSB23_Q53_1,GSB23,Q53,1,2023,53. Welche Führungsaufgaben haben die übrigen ...,continuous,numeric-40.0,,,,,
82,GSB23_Q59_1,GSB23_Q59_1,GSB23,Q59,1,2023,59. Welche Kriterien waren bei Ihrer Anstellun...,continuous,numeric-40.0,,,,,
83,GSB23_Q60_1,GSB23_Q60_1,GSB23,Q60,1,2023,60. Wem fühlen Sie sich bei der Erfüllung Ihre...,continuous,numeric-40.0,,,,,


In [86]:
df_first_subquestions.shape

(85, 14)

In [87]:
# Step 1: Liste des questions supplémentaires à ajouter
extra_questions_codes = [
    'GSB23_Teilnahme', 'GSB23_Mode', 'GSB23_StartDate', 'GSB23_EndDate', 
    'GSB23_Progress', 'GSB23_Duration', 'GSB23_Finished', 'GSB23_Recorded', 
    'GSB23_UserLang', 'GSB23_Q99', 'GSB23_Q100', 'GSB23_Q101'
]

# Step 2: Filtrer le df_codebook pour ces questions
df_extra_questions = df_codebook_23[df_codebook_23['code'].isin(extra_questions_codes)]

# Step 3: Combiner les 10 premières sous-questions avec les questions supplémentaires
df_combined = pd.concat([df_first_subquestions, df_extra_questions])

# Step 4: Réinitialiser les index du DataFrame combiné
df_combined.reset_index(drop=True, inplace=True)

# Step 5: Afficher le DataFrame combiné
df_combined.head(25)

Unnamed: 0,code_original,code,enquete,num_question,cat_reponse,year,label,type,format,text_de,text_fr,text_it,text_ro,text_en
0,GSB23_Q99,GSB23_Q99,GSB23,Q99,,2023,In welchem Kanton liegt Ihre Gemeinde?,continuous,numeric-40.0,,,,,
1,GSB23_Q100,GSB23_Q100,GSB23,Q100,,2023,In welcher Gemeinde sind Sie tätig? (falls Sie...,continuous,numeric-40.0,,,,,
2,GSB23_Q101,GSB23_Q101,GSB23,Q101,,2023,"Falls Sie Ihre Gemeinde nicht gefunden haben, ...",discrete,character-200,,,,,
3,GSB23_Q10,GSB23_Q10,GSB23,Q10,,2023,1. Gemeinden können in unterschiedlichem Masse...,continuous,numeric-40.0,,,,,
4,GSB23_Q4,GSB23_Q4,GSB23,Q4,,2023,4. Wie hat sich der Steuerfuss 2023 (die Steue...,continuous,numeric-40.0,,,,,
5,GSB23_Q5,GSB23_Q5,GSB23,Q5,,2023,5. Wie hat sich der reale Ertrag aus der Einko...,continuous,numeric-40.0,,,,,
6,GSB23_Q6,GSB23_Q6,GSB23,Q6,,2023,6. Wie hat sich die Nettoverschuldung in Ihrer...,continuous,numeric-40.0,,,,,
7,GSB23_Q7,GSB23_Q7,GSB23,Q7,,2023,7. Wie steht Ihre Gemeinde im kantonalen Finan...,continuous,numeric-40.0,,,,,
8,GSB23_Q8,GSB23_Q8,GSB23,Q8,,2023,8. Wie gross ist der Anteil der\nFinanzausglei...,continuous,numeric-40.0,,,,,
9,GSB23_Q9,GSB23_Q9,GSB23,Q9,,2023,9. Wie gross ist der Anteil der Steuereinnahme...,continuous,numeric-40.0,,,,,


In [88]:
df_combined.shape

(96, 14)

In [89]:
df_combined.columns

Index(['code_original', 'code', 'enquete', 'num_question', 'cat_reponse',
       'year', 'label', 'type', 'format', 'text_de', 'text_fr', 'text_it',
       'text_ro', 'text_en'],
      dtype='object')

In [90]:
with open('municipalities.json', encoding="utf-8") as f:
    municipalities_data = json.load(f)

list_of_commune_ids = [feature['properties']['id'] for feature in municipalities_data['features']]
#print(list_of_commune_ids)

# the column GSB23_Q100 contains the commune ids
df_filtered = df_gsb_23[df_gsb_23['GSB23_Q100'].isin(list_of_commune_ids)]

col_of_interest = df_combined['code_original'].tolist()

print(col_of_interest)
# Filtrer seulement les colonnes qui existent dans df_filtered
col_of_interest_filtered = [col for col in col_of_interest if col in df_filtered.columns]

df_commune_responses = df_filtered[col_of_interest_filtered]
df_commune_responses.head()

['GSB23_Q99', 'GSB23_Q100', 'GSB23_Q101', 'GSB23_Q10', 'GSB23_Q4', 'GSB23_Q5', 'GSB23_Q6', 'GSB23_Q7', 'GSB23_Q8', 'GSB23_Q9', 'GSB23_Q10', 'GSB23_Q11', 'GSB23_Q17', 'GSB23_Q20', 'GSB23_Q21', 'GSB23_Q22', 'GSB23_Q24', 'GSB23_Q26', 'GSB23_Q27', 'GSB23_Q30', 'GSB23_Q31', 'GSB23_Q32', 'GSB23_Q33', 'GSB23_Q34', 'GSB23_Q38', 'GSB23_Q39', 'GSB23_Q41', 'GSB23_Q43', 'GSB23_Q45', 'GSB23_Q46', 'GSB23_Q49', 'GSB23_Q50', 'GSB23_Q52', 'GSB23_Q54', 'GSB23_Q55', 'GSB23_Q56', 'GSB23_Q58', 'GSB23_Q61', 'GSB23_Q62', 'GSB23_Q63', 'GSB23_Q64', 'GSB23_Q1_1', 'GSB23_Q2_1', 'GSB23_Q3_1', 'GSB23_Q12_1', 'GSB23_Q13_1', 'GSB23_Q14_1', 'GSB23_Q15_1', 'GSB23_Q16_1', 'GSB23_Q18_1', 'GSB23_Q19_1', 'GSB23_Q21_1', 'GSB23_Q23_1', 'GSB23_Q25_1', 'GSB23_Q35_1\n_10', 'GSB23_Q35_1\n_11', 'GSB23_Q35_1\n_12', 'GSB23_Q35_1\n_13', 'GSB23_Q35_1\n_14', 'GSB23_Q35_1\n_15', 'GSB23_Q35_1\n_16', 'GSB23_Q35_1\n_17', 'GSB23_Q36_1', 'GSB23_Q44_1\n_10_1', 'GSB23_Q44_1\n_10_2', 'GSB23_Q44_1\n_11_1', 'GSB23_Q44_1\n_11_2', 'GSB23_Q44_1\n_

Unnamed: 0,GSB23_Q99,GSB23_Q100,GSB23_Q101,GSB23_Q10,GSB23_Q4,GSB23_Q5,GSB23_Q6,GSB23_Q7,GSB23_Q8,GSB23_Q9,...,GSB23_Q60_1,GSB23_Q65_1,GSB23_Mode,GSB23_StartDate,GSB23_EndDate,GSB23_Progress,GSB23_Finished,GSB23_Q99.1,GSB23_Q100.1,GSB23_Q101.1
0,1.0,1.0,,1.0,1.0,3.0,2.0,1.0,-2.0,3.0,...,4.0,-2.0,0.0,2023-02-16 08:34:08,2023-02-16 10:16:09,100.0,1.0,1.0,1.0,
1,1.0,2.0,,2.0,2.0,2.0,2.0,2.0,99.0,99.0,...,3.0,-2.0,0.0,2023-02-16 08:50:23,2023-02-17 10:17:44,100.0,1.0,1.0,2.0,
2,1.0,3.0,,2.0,2.0,2.0,5.0,2.0,2.0,3.0,...,4.0,1.0,0.0,2023-03-30 10:16:48,2023-03-30 11:49:16,100.0,1.0,1.0,3.0,
3,1.0,4.0,,2.0,2.0,99.0,99.0,99.0,-2.0,99.0,...,4.0,-2.0,0.0,2023-02-28 10:27:13,2023-02-28 11:45:36,100.0,1.0,1.0,4.0,
4,1.0,5.0,,1.0,3.0,2.0,99.0,4.0,-2.0,5.0,...,,,0.0,2023-02-20 14:13:19,2023-02-20 14:34:49,40.0,0.0,1.0,5.0,


In [91]:
df_commune_responses = df_commune_responses.loc[:, ~df_commune_responses.columns.duplicated()]

In [92]:
print(df_commune_responses.columns)


Index(['GSB23_Q99', 'GSB23_Q100', 'GSB23_Q101', 'GSB23_Q10', 'GSB23_Q4',
       'GSB23_Q5', 'GSB23_Q6', 'GSB23_Q7', 'GSB23_Q8', 'GSB23_Q9', 'GSB23_Q11',
       'GSB23_Q17', 'GSB23_Q20', 'GSB23_Q21', 'GSB23_Q22', 'GSB23_Q24',
       'GSB23_Q26', 'GSB23_Q27', 'GSB23_Q30', 'GSB23_Q31', 'GSB23_Q32',
       'GSB23_Q33', 'GSB23_Q34', 'GSB23_Q38', 'GSB23_Q39', 'GSB23_Q41',
       'GSB23_Q43', 'GSB23_Q45', 'GSB23_Q46', 'GSB23_Q49', 'GSB23_Q50',
       'GSB23_Q52', 'GSB23_Q54', 'GSB23_Q55', 'GSB23_Q56', 'GSB23_Q58',
       'GSB23_Q61', 'GSB23_Q62', 'GSB23_Q63', 'GSB23_Q64', 'GSB23_Q1_1',
       'GSB23_Q2_1', 'GSB23_Q3_1', 'GSB23_Q12_1', 'GSB23_Q13_1', 'GSB23_Q14_1',
       'GSB23_Q15_1', 'GSB23_Q16_1', 'GSB23_Q18_1', 'GSB23_Q19_1',
       'GSB23_Q21_1', 'GSB23_Q23_1', 'GSB23_Q25_1', 'GSB23_Q36_1',
       'GSB23_Q51_1', 'GSB23_Q53_1', 'GSB23_Q59_1', 'GSB23_Q60_1',
       'GSB23_Q65_1', 'GSB23_Mode', 'GSB23_StartDate', 'GSB23_EndDate',
       'GSB23_Progress', 'GSB23_Finished'],
      dtype='obje

In [93]:
for feature in municipalities_data['features']:
    commune_id = feature['properties']['id']
    
    # Filtrer les réponses pour cette commune
    commune_response = df_commune_responses[df_commune_responses['GSB23_Q100'] == commune_id]
    
    if not commune_response.empty:
        # Ajouter la réponse au GeoJSON (par exemple pour une question)
        feature['properties']['response_Q1'] = commune_response['GSB23_Q1_1'].values[0]

df_commune_responses.to_csv('data/commune_responses.csv', index=False, encoding='utf-8')

In [97]:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
df_combined['text_de'] = df_combined['label'].str.replace(r'^\d+\.\s*', '', regex=True)


model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")

# translation function with mbart 
def translate_mbart(text, source_lang, target_lang):
    tokenizer.src_lang = source_lang
    # encode the text to be translated
    encoded_input = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
    generated_tokens = model.generate(
        **encoded_input,
        forced_bos_token_id=tokenizer.lang_code_to_id[target_lang],
        max_length=512,
        num_beams=4,  # use beam search for better results
        early_stopping=True
    )
    # decode the generated tokens
    return tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]

df_combined['text_en'] = df_combined['text_de'].apply(lambda x: translate_mbart(x, "de_DE", "en_XX"))
df_combined['text_fr'] = df_combined['text_en'].apply(lambda x: translate_mbart(x, "en_XX", "fr_XX"))
df_combined['text_it'] = df_combined['text_en'].apply(lambda x: translate_mbart(x, "en_XX", "it_IT"))

In [100]:
df_combined.head()

Unnamed: 0,code_original,code,enquete,num_question,cat_reponse,year,label,type,format,text_de,text_fr,text_it,text_ro,text_en
0,GSB23_Q99,GSB23_Q99,GSB23,Q99,,2023,In welchem Kanton liegt Ihre Gemeinde?,continuous,numeric-40.0,In welchem Kanton liegt Ihre Gemeinde?,Dans quel canton se trouve votre municipalité?,In quale cantono si trova la vostra municipalità?,Translation not available for the moment,In which canton is your municipality located?
1,GSB23_Q100,GSB23_Q100,GSB23,Q100,,2023,In welcher Gemeinde sind Sie tätig? (falls Sie...,continuous,numeric-40.0,In welcher Gemeinde sind Sie tätig? (falls Sie...,Dans quelle municipalité êtes-vous actif? (si ...,In quale municipalità siete attivi? (se non tr...,Translation not available for the moment,In which municipality are you active? (if you ...
2,GSB23_Q101,GSB23_Q101,GSB23,Q101,,2023,"Falls Sie Ihre Gemeinde nicht gefunden haben, ...",discrete,character-200,"Falls Sie Ihre Gemeinde nicht gefunden haben, ...","Si vous n'avez pas trouvé votre municipalité, ...","Se non avete trovato la vostra municipalità, i...",Translation not available for the moment,"If you have not found your municipality, pleas..."
3,GSB23_Q10,GSB23_Q10,GSB23,Q10,,2023,1. Gemeinden können in unterschiedlichem Masse...,continuous,numeric-40.0,Gemeinden können in unterschiedlichem Masse vo...,Les collectivités peuvent être touchées par ce...,Le comunità possono essere influenzate da cert...,Translation not available for the moment,Communities may be affected by certain societa...
4,GSB23_Q4,GSB23_Q4,GSB23,Q4,,2023,4. Wie hat sich der Steuerfuss 2023 (die Steue...,continuous,numeric-40.0,Wie hat sich der Steuerfuss 2023 (die Steueran...,Comment a-t-on changé la base fiscale de votre...,Come è cambiata la base fiscale nel 2023 (il s...,Translation not available for the moment,How has the tax base in 2023 (the tax system) ...


In [99]:
df_combined['text_ro'] = 'Translation not available for the moment'

In [101]:
df_combined.to_csv('data/combined_df.csv', index=False, encoding='utf-8')