# create_sheets.ipynb

*Sheet to take a single .xlsx file containing all the tables in different sheets, clean the data, and convert these to individual CSV files to be uploaded to ArcGIS*

## Imports

### Library imports

In [1]:
import pandas as pd 

### Define file/directory paths

**Modify these variables to match environment**

In [2]:
input_dir:str = 'sheets/excel/'                 # Path to the directory containing the single excel sheet (original_file)
output_dir:str = 'sheets/csvs/'                 # Path to the directory to output the individual CSV files
original_file:str = 'Giudecca_Factories.xlsx'   # Filename of the original excel file to clean and separate 

# Sheet names of interest in the excel file
sheet_names:list[str] = [
    'Building',                         # Building (Entity table)
    'Giudecca_Pop_Over_Time',           # Population of Giudecca over time (Entity table)
    'Factory',                          # Factory (Entity table)
    'Factory_At_Building',              # Match a Factory to a Building (Relationship table)
    'Timeperiod',                       # Contextually significant timeperiods (Entity table) 
    'Product_Over_Time',                # Product(s) for each factory over time (Relationship table) 
    'Employment_Over_Time',             # Employment for each factory over time (Relationship table)
    'Photo_Sources'                     # Sources and links for all photos (Relationship table)
]



**Do not modify these variables**

In [3]:

dfs_dict:dict[str, pd.DataFrame] = { 
    n : pd.read_excel(input_dir + original_file, sheet_name=n) 
    for n in sheet_names
}

## Data cleaning

**Cleaning each df to be standard**

In [4]:
cleaned_dfs:dict[str, pd.DataFrame] = {}

# Iterate over dfs_dict and clean each df
for k,df in dfs_dict.items():
    # Remove leading and trailing whitespace and make the datatype int if possible
    clean_df = df.applymap(lambda x: 
                    int(float(x.strip())) if isinstance(x, str) and x.strip().isdigit() # Convert to int if the val is a digit
                    else x.strip() if isinstance(x, str)                         # Remove whitespace
                    else x                                                       # Default
                )
    
    print(clean_df)
    clean_df.reset_index(drop=True)
    # Convert the new (cleaned) DF into a CSV with the sheetname (key) as the filename
    clean_df.to_csv(output_dir + k + '.csv', index=False)
        

    Building_ID  Latitude   Longitude   Now_Used_For  \
0             3  45.427778   12.319167         Hotel   
1             4  45.425556   12.318056   Residential   
2             5  45.427778   12.320556       Factory   
3             6  45.426944   12.320833   Residential   
4             7  45.426667   12.321111           NaN   
5             8  45.426111   12.320000           NaN   
6             9  45.425833   12.319167   Residential   
7            10  45.425000   12.319167           NaN   
8            11  45.424444   12.320833           NaN   
9            12  45.424167   12.321111           NaN   
10           13  45.426944   12.323056   Residential   
11           14  45.425278   12.323056      Business   
12           15  45.423611   12.322778  Municipality   
13          151  45.423889   12.322222      Shipyard   
14          153  45.423889   12.323056      Shipyard   
15          154  45.423889   12.322778      Shipyard   
16          155  45.424167   12.322500      Ship