# Group Assignment
## Covid-19 Data - ETL Process
**Group Participants:**
- Uxía Lojo
- Emiliano Puertas
- María Camila Sanabria
- Joshua Vanderspuy
- Sebastian Zambrano

### **Part I: Extracting Process**
In this first part, we will create a function to extract the information from each table and converting them into data frames that we can use.
For this, we will take into account:
1. You have to build a big table containing all info from the original CSV files. 

We assume we will always be receiving all 6 files.

In [1]:
import pandas as pd
import os 

In [2]:
# We create the data extraction function
def data_extraction(directory):

    # We create a dictionary that will contain our tables in order to make our job easier later.
    tables={} 
    
    #We extract only the list of FILES, without folders.
    
    files=[file for file in os.listdir(directory) if os.path.isfile(os.path.join(directory,file))]
    
    # We need to create a for loop to go through the files in our path and save them as a data frame in our directory
    for file in files:
        #We store the DataFrames in our table.
        tables[file]=pd.read_csv(os.path.join(directory,file))
    
    return tables


Using the function with our current directory and data sets

In [3]:
data=data_extraction("data/")

In [4]:
data

{'demographics':      location_key  population  population_male  population_female  \
 0     DE_BB_12051     72124.0          35617.0            36507.0   
 1     DE_BB_12052    100219.0          49201.0            51018.0   
 2     DE_BB_12053     57873.0          28023.0            29850.0   
 3     DE_BB_12054    178089.0          86179.0            91910.0   
 4     DE_BB_12060    182760.0          90615.0            92145.0   
 ...           ...         ...              ...                ...   
 5092  US_WY_56037     43464.0          22438.0            21026.0   
 5093  US_WY_56039     23384.0          12133.0            11251.0   
 5094  US_WY_56041     20431.0          10339.0            10092.0   
 5095  US_WY_56043      8010.0           4055.0             3955.0   
 5096  US_WY_56045      6968.0           3660.0             3308.0   
 
       population_rural  population_urban  population_largest_city  \
 0                  NaN               NaN                      NaN   
 1

## 

### **Part II: Transformation process**

#### **_Extracting columns for our tables_**
For this part, a previous scan of each of the data sets was performed in order to decide which columns we would keep. For this, we performed a missing values ratio and a relevance study. We decided to drop columns that were either repetitive or had more than 50% of values missing.

For this part, we assume that all the tables are being uploaded with all of the existing columns.

In [5]:
    #First, we do our column dictionary
column_dict={
    'demographics': ['location_key','population','population_male','population_female','population_age_00_09','population_age_10_19','population_age_20_29','population_age_30_39','population_age_40_49','population_age_50_59','population_age_60_69','population_age_70_79','population_age_80_and_older'],
    'epidemiology':['date','location_key','new_confirmed','new_deceased'],
    'health':['location_key','life_expectancy'],
    'hospitalizations':['date','location_key','new_hospitalized_patients'],
    'index':['location_key','country_name'],
    'vaccinations':['date','location_key','new_persons_fully_vaccinated']
}

In [6]:
def assign_columns(tables:dict, column_dict:dict):

    # We go through each table in our dictionaries and reassign the columns we want for that table.
    for key, value in tables.items():
        if key in column_dict:
            tables[key]=value[column_dict[key]]

    return tables

Using our function on our data frame.

In [7]:
data=assign_columns(data, column_dict)
data

{'demographics':      location_key  population  population_male  population_female  \
 0     DE_BB_12051     72124.0          35617.0            36507.0   
 1     DE_BB_12052    100219.0          49201.0            51018.0   
 2     DE_BB_12053     57873.0          28023.0            29850.0   
 3     DE_BB_12054    178089.0          86179.0            91910.0   
 4     DE_BB_12060    182760.0          90615.0            92145.0   
 ...           ...         ...              ...                ...   
 5092  US_WY_56037     43464.0          22438.0            21026.0   
 5093  US_WY_56039     23384.0          12133.0            11251.0   
 5094  US_WY_56041     20431.0          10339.0            10092.0   
 5095  US_WY_56043      8010.0           4055.0             3955.0   
 5096  US_WY_56045      6968.0           3660.0             3308.0   
 
       population_age_00_09  population_age_10_19  population_age_20_29  \
 0                   6029.0                5183.0                66

Since we will only be joining our data by country, we will transform the location_key column to keep only the first two characters, which represent the country. This will also help us avoid issues when aggreagating our information.

In [8]:
def loc_key_transformation (tables:dict):
    location='location_key'
    for key,value in tables.items():
        if location in value.columns:
            value[location]=value[location].str[:2]
    tables[key]=value
    return tables


In [9]:
data=loc_key_transformation(data)

#### **_Dropping duplicates and empty rows function_**
For this part, we are dropping any duplicates or completely empty rows for all the tables

In [10]:
def dropping_dup_empty (tables: dict):
    for key,value in tables.items():
        value=value.drop_duplicates()
        value=value.dropna(how='all')
        tables[key]=value
    return tables

Using our function on our data set

In [11]:
data=dropping_dup_empty(data)
data

{'demographics':      location_key  population  population_male  population_female  \
 0              DE     72124.0          35617.0            36507.0   
 1              DE    100219.0          49201.0            51018.0   
 2              DE     57873.0          28023.0            29850.0   
 3              DE    178089.0          86179.0            91910.0   
 4              DE    182760.0          90615.0            92145.0   
 ...           ...         ...              ...                ...   
 5092           US     43464.0          22438.0            21026.0   
 5093           US     23384.0          12133.0            11251.0   
 5094           US     20431.0          10339.0            10092.0   
 5095           US      8010.0           4055.0             3955.0   
 5096           US      6968.0           3660.0             3308.0   
 
       population_age_00_09  population_age_10_19  population_age_20_29  \
 0                   6029.0                5183.0                66

#### **_Data Cleaning_**
For this part, we are filling null values and dropping rows that do not make sense.

- **'location_key'**: If it's empty, we will not be able to infer where the data comes from, so we will drop the row.
- **'date'** : If it's empty, we will not be able to infer where the data comes from, so we will drop the row.
- **'population**: If this is empty we will be replacing with the median of the same location.
- **'new_confirmed':** If this is empty we will be replacing with 0 because we assume the NaN is due to no cases being reported.
- **'new_deceased':** If this is empty we will be replacing with 0 because we assume the NaN is due to no cases being reported.
- **'life_expectancy':** If this is empty we will be replacing with the median of the same location.
- **'new_hospitalized_patients':** If this is empty we will be replacing with 0 because we assume the NaN is due to no cases being reported.
- **'new_persons_fully_vaccinated':** If this is empty we will be replacing with 0 because we assume the NaN is due to no cases being reported.

For this, we will have most functions separated, depending on their purpose, and we will then join them in a master function.

### **_Clean on empty location_key or dates_**
We drop rows that do not have a location_key or date because it would be very hard to identify where the information comes from.

In [12]:
def drop_empty(tables:dict):

    for key,value in tables.items():
        #Check if the table has the column location key before applying the drop function to avoid errors.
        if 'location_key' in value.columns: 
            value=value.dropna(subset=['location_key'])
        #Check if the table has the column date before applying the drop function to avoid errors.
        if 'date' in value.columns:
            value=value.dropna(subset=['date'])
        #Reassigning the corrected table in our tables dictionary
        tables[key]=value    
        
    return tables


### **_Filter dates and countries_** 

In [13]:
def filter(tables:dict,start:str, end:str, countries:list):
#We filter countries
    if 'index' in tables and countries:
        tables['index']=tables['index'][tables['index'].country_name.isin(countries)]
    for key,value in tables.items():
        if 'date' in value.columns: #Check if there is a date column in our table
            value=value[(value['date']>=start)&(value['date']<=end)]
        tables[key]=value
    return tables

### **_Fill Population and Life Expectancy Data_**
We will fill this information with the median of each location, this will be to take care of possible outliers.

In [14]:
def fill_median(tables:dict):
    #List of columns we will be changing
    change_column=['population','population_male','population_female','population_age_00_09',
                 'population_age_10_19','population_age_20_29','population_age_30_39','population_age_40_49',
                 'population_age_50_59','population_age_60_69','population_age_70_79','population_age_80_and_older',
                 'life_expectancy']
    
    #We will do a for loop to go through each column inside our table and apply the changes were there need to be
    
    for key,value in tables.items(): #Accesing the tables in our dictionary
        for col in change_column: #Accesing the list of columns
            if col in value.columns: #Checking if the column is in our table
                med=value.groupby('location_key')[col].median() #We create a table with the values grouped with their median by location
                value[col]=value[col].fillna(value['location_key'].map(med)) #Filling with the median of our column with the corresponding value.
                #We use the map function which will locate the location key in our table and match it in our med table (with the grouped values),
                #then it will return the median value
        
        tables[key]=value
        
    return tables
    

### **_Fill New Cases_**
We will fill this information with 0 because we assume the NaN is due to no cases being reported.

In [15]:
def fill_zero(tables:dict):
    #List of columns we will be changing
    change_column=['new_confirmed','new_deceased','new_hospitalized_patients','new_persons_fully_vaccinated']
    
    #We will do a for loop to go through each column inside our table and apply the changes were there need to be
    
    for key,value in tables.items(): #Accesing the tables in our dictionary
        for col in change_column: #Accesing the list of columns
            if col in value.columns: #Checking if the column is in our table
                value[col]=value[col].fillna(0) #Filling with 0.
        
        tables[key]=value
        
    return tables


### **_Week Column Creation_**
Now, we will create a column with the weeks where the dates correspond, and then we will drop the 'date' columns

First, we will transform all date columns into dates, since they are objects right now

In [16]:
def date_transformation(tables:dict):
    dt='date' #Assign name to a variable for simplicity
    for key,value in tables.items(): #For loop to go through our tables
        if dt in value.columns: #Checking tables that have a column 'date'
            value[dt]=pd.to_datetime(value[dt]) #Transforming the column into datetime
    tables[key]=value #Assigning the new table values to their corresponding key
    return tables

Now, we will add a week column to each of the tables and drop the date column

In [17]:
def week_dates(dt):
    st_date=(dt-pd.Timedelta(days=dt.weekday())).date() #First, we calculate the starting date of the week to which this date belongs to.
    end_date=(dt+pd.Timedelta(days=6-dt.weekday())).date() #Then, we calculate the ending date of the week to which this date belongs to.
    return f"{st_date}/{end_date}" #Format result

In [18]:
def week_column(tables:dict):
    for key,value in tables.items(): #For loop to go through our tables

        if 'date' in value.columns: #Checking tables that have a column 'date'
            
            value["week"] = value['date'].apply(week_dates) #Appy previous function to add column week with values

            value = value.drop(columns=['date']) #Drop date column
        tables[key]=value  #Assign new table to the key
    return tables

### **_Turning all of the functions into a master function_**

In [19]:
def cleaning_data(tables:dict):
    tables=drop_empty(tables)
    tables=fill_median(tables)
    tables=fill_zero(tables)
    tables=date_transformation(tables)
    tables=week_column(tables)
    return tables

Using our functions in our data set

In [20]:
data=cleaning_data(data)
data

{'demographics':      location_key  population  population_male  population_female  \
 0              DE     72124.0          35617.0            36507.0   
 1              DE    100219.0          49201.0            51018.0   
 2              DE     57873.0          28023.0            29850.0   
 3              DE    178089.0          86179.0            91910.0   
 4              DE    182760.0          90615.0            92145.0   
 ...           ...         ...              ...                ...   
 5092           US     43464.0          22438.0            21026.0   
 5093           US     23384.0          12133.0            11251.0   
 5094           US     20431.0          10339.0            10092.0   
 5095           US      8010.0           4055.0             3955.0   
 5096           US      6968.0           3660.0             3308.0   
 
       population_age_00_09  population_age_10_19  population_age_20_29  \
 0                   6029.0                5183.0                66

### **_Aggregations_**

First, we will be doing our aggregations in our tables

In [22]:
def aggregations (tables:dict):

    for key,value in tables.items():
        if 'week' in value.columns and 'location_key' in value.columns:
            value=value.groupby(by=['week','location_key'], as_index=False).sum()
        elif 'location_key' in value.columns and 'week' not in value.columns:
            if key=='health':
                value=value.groupby(by='location_key', as_index=False).mean()
            else:
                value=value.groupby(by='location_key', as_index=False).sum()
        tables[key]=value
    return tables

In [34]:
data['index']

Unnamed: 0,location_key,country_name
0,DE,Germany
1,ES,Spain
2,IT,Italy
3,US,United States of America


In [23]:
data=aggregations(data)
data

{'demographics':   location_key   population  population_male  population_female  \
 0           DE   82786787.0       40126479.0         41172726.0   
 1           ES   19357122.0        8224314.0          8300422.0   
 2           IT   55443101.0       26986652.0         28456449.0   
 3           US  341338766.0      167361582.0        172779253.0   
 
    population_age_00_09  population_age_10_19  population_age_20_29  \
 0             7401202.0             7586334.0             9513883.0   
 1             1513858.0             1713301.0             1703871.0   
 2             4633566.0             5231560.0             5623413.0   
 3            42034674.0            43600405.0            47667343.0   
 
    population_age_30_39  population_age_40_49  population_age_50_59  \
 0            10265460.0            10205383.0            13258896.0   
 1             2111146.0             2739366.0             2474428.0   
 2             6471216.0             8507701.0             86106

### **_Joins_**
Now, we will be joining out information taking into account our results

In [24]:
def joins(tables:dict):
    #First we join epidemiology table with hospitalization table with a left outer join to not lose values
    t1=pd.merge(tables['epidemiology'],tables['hospitalizations'],how="left", on=['week','location_key'])
    #Then we join the resulting table with vaccinations table with a left outer join to not lose values
    t2=pd.merge(t1,tables['vaccinations'],how="left",on=['week','location_key'])
    #Then we join the resulting table with health table with a left outer join to not lose values
    t3=pd.merge(t2,tables['health'],how="left")
    #Then we join our info tables demographics and index with an inner join
    t4=pd.merge(tables['demographics'],tables['index'],how="inner")
    #Then we get our final table with an inner join of t3 and t4
    final_table=pd.merge(t3,t4,how="inner")
    return final_table

In [25]:
os.getcwd()

'c:\\Users\\HP\\Documents\\MBD\\Courses\\Python for Data Analysis\\PDAI\\Group 4'

In [26]:
table=joins(data)
table

Unnamed: 0,week,location_key,new_confirmed,new_deceased,new_hospitalized_patients,new_persons_fully_vaccinated,life_expectancy,population,population_male,population_female,population_age_00_09,population_age_10_19,population_age_20_29,population_age_30_39,population_age_40_49,population_age_50_59,population_age_60_69,population_age_70_79,population_age_80_and_older,country_name
0,2019-12-30/2020-01-05,DE,1.0,0.0,,,,82786787.0,40126479.0,41172726.0,7401202.0,7586334.0,9513883.0,10265460.0,10205383.0,13258896.0,10159451.0,7543815.0,5313340.0,Germany
1,2020-01-13/2020-01-19,DE,1.0,0.0,,,,82786787.0,40126479.0,41172726.0,7401202.0,7586334.0,9513883.0,10265460.0,10205383.0,13258896.0,10159451.0,7543815.0,5313340.0,Germany
2,2020-01-20/2020-01-26,DE,2.0,0.0,,,,82786787.0,40126479.0,41172726.0,7401202.0,7586334.0,9513883.0,10265460.0,10205383.0,13258896.0,10159451.0,7543815.0,5313340.0,Germany
3,2020-01-20/2020-01-26,US,0.0,0.0,,,77.973595,341338766.0,167361582.0,172779253.0,42034674.0,43600405.0,47667343.0,45409648.0,42444067.0,45231261.0,38309532.0,22458802.0,12982024.0,United States of America
4,2020-01-27/2020-02-02,DE,6.0,0.0,,,,82786787.0,40126479.0,41172726.0,7401202.0,7586334.0,9513883.0,10265460.0,10205383.0,13258896.0,10159451.0,7543815.0,5313340.0,Germany
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
500,2022-08-08/2022-08-14,IT,154987.0,0.0,,,,55443101.0,26986652.0,28456449.0,4633566.0,5231560.0,5623413.0,6471216.0,8507701.0,8610615.0,6764526.0,5533564.0,4066940.0,Italy
501,2022-08-08/2022-08-14,US,122778.0,502.0,1548.0,240164.0,77.973595,341338766.0,167361582.0,172779253.0,42034674.0,43600405.0,47667343.0,45409648.0,42444067.0,45231261.0,38309532.0,22458802.0,12982024.0,United States of America
502,2022-08-15/2022-08-21,IT,127160.0,0.0,,,,55443101.0,26986652.0,28456449.0,4633566.0,5231560.0,5623413.0,6471216.0,8507701.0,8610615.0,6764526.0,5533564.0,4066940.0,Italy
503,2022-08-15/2022-08-21,US,107459.0,400.0,669.0,180458.0,77.973595,341338766.0,167361582.0,172779253.0,42034674.0,43600405.0,47667343.0,45409648.0,42444067.0,45231261.0,38309532.0,22458802.0,12982024.0,United States of America


In [27]:
table.describe()

Unnamed: 0,new_confirmed,new_deceased,new_hospitalized_patients,new_persons_fully_vaccinated,life_expectancy,population,population_male,population_female,population_age_00_09,population_age_10_19,population_age_20_29,population_age_30_39,population_age_40_49,population_age_50_59,population_age_60_69,population_age_70_79,population_age_80_and_older
count,505.0,505.0,130.0,69.0,135.0,505.0,505.0,505.0,505.0,505.0,505.0,505.0,505.0,505.0,505.0,505.0,505.0
mean,201999.4,1885.332673,2519.561538,1669640.0,77.97359,128734300.0,62645180.0,64724000.0,14449480.0,15114410.0,16724540.0,16609360.0,16490880.0,17866680.0,14710490.0,9452236.0,5944872.0
std,477495.8,3911.875903,3756.430061,2493827.0,1.28374e-13,130410100.0,64277000.0,66332970.0,16801950.0,17344490.0,18900910.0,17639110.0,15928360.0,16961490.0,14554270.0,8159326.0,4527477.0
min,-412.0,0.0,2.0,-215177.0,77.97359,19357120.0,8224314.0,8300422.0,1513858.0,1713301.0,1703871.0,2111146.0,2739366.0,2474428.0,1899658.0,1399847.0,992477.0
25%,9289.0,0.0,591.0,259265.0,77.97359,19357120.0,8224314.0,8300422.0,1513858.0,1713301.0,1703871.0,2111146.0,2739366.0,2474428.0,1899658.0,1399847.0,992477.0
50%,49865.0,28.0,1276.0,900041.0,77.97359,55443100.0,26986650.0,28456450.0,4633566.0,5231560.0,5623413.0,6471216.0,8507701.0,8610615.0,6764526.0,5533564.0,4066940.0
75%,196348.0,1460.0,2683.5,1796788.0,77.97359,341338800.0,167361600.0,172779300.0,42034670.0,43600400.0,47667340.0,45409650.0,42444070.0,45231260.0,38309530.0,22458800.0,12982020.0
max,5153090.0,22378.0,23486.0,16347200.0,77.97359,341338800.0,167361600.0,172779300.0,42034670.0,43600400.0,47667340.0,45409650.0,42444070.0,45231260.0,38309530.0,22458800.0,12982020.0


In [28]:
macrotable=pd.read_csv("C:/Users/HP/Documents/MBD/Courses/Python for Data Analysis/PDAI/Group Assignment/data/macrotable/macrotable_c")
macrotable.describe()

Unnamed: 0,new_confirmed,new_deceased,new_deceased_confirmed_ratio,population,population_age_00_09,population_age_10_19,population_age_20_29,population_age_30_39,population_age_40_49,population_age_50_59,population_age_60_69,population_age_70_79,population_age_80_and_older,life_expectancy,new_hospitalized_patients,new_persons_fully_vaccinated
count,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0,504.0,135.0,135.0,135.0
mean,246592.3,2286.992063,0.004206,129322000.0,14641820.0,15345430.0,17006170.0,16936720.0,16849630.0,18199740.0,14923810.0,9584126.0,6019731.0,77.872,2544.6,1021703.0
std,540648.3,4633.63737,0.014912,130128400.0,16788270.0,17301400.0,18858620.0,17547580.0,15772860.0,16820290.0,14477750.0,8099128.0,4497196.0,1.569016e-13,3682.569298,2132879.0
min,-67.0,0.0,0.0,21098460.0,1915634.0,2266531.0,2337994.0,2942982.0,3758505.0,3351295.0,2417174.0,1717090.0,1152301.0,77.872,4.0,-168261.0
25%,16329.25,0.0,0.0,21098460.0,1915634.0,2266531.0,2337994.0,2942982.0,3758505.0,3351295.0,2417174.0,1717090.0,1152301.0,77.872,616.0,0.0
50%,71022.5,31.0,0.0,55443100.0,4633566.0,5231560.0,5623413.0,6471216.0,8507701.0,8610615.0,6764526.0,5533564.0,4066940.0,77.872,1336.0,43759.0
75%,250791.0,1934.5,0.000715,341338800.0,42185400.0,43751020.0,47870800.0,45602370.0,42602090.0,45373660.0,38404010.0,22505390.0,13013870.0,77.872,2591.3,1259466.0
max,5656738.0,25873.0,0.25,341338800.0,42185400.0,43751020.0,47870800.0,45602370.0,42602090.0,45373660.0,38404010.0,22505390.0,13013870.0,77.872,23486.0,16765960.0


In [29]:
macrotable

Unnamed: 0,week,country_name,new_confirmed,new_deceased,new_deceased_confirmed_ratio,population,population_age_00_09,population_age_10_19,population_age_20_29,population_age_30_39,population_age_40_49,population_age_50_59,population_age_60_69,population_age_70_79,population_age_80_and_older,life_expectancy,new_hospitalized_patients,new_persons_fully_vaccinated
0,2019-12-30/2020-01-05,Germany,1.0,0.0,0.000000,82786787.0,7.539514e+06,7.725134e+06,9.713905e+06,1.046693e+07,1.039496e+07,1.350341e+07,1.034477e+07,7.684878e+06,5.413285e+06,,,
1,2020-01-13/2020-01-19,Germany,1.0,0.0,0.000000,82786787.0,7.539514e+06,7.725134e+06,9.713905e+06,1.046693e+07,1.039496e+07,1.350341e+07,1.034477e+07,7.684878e+06,5.413285e+06,,,
2,2020-01-20/2020-01-26,Germany,2.0,0.0,0.000000,82786787.0,7.539514e+06,7.725134e+06,9.713905e+06,1.046693e+07,1.039496e+07,1.350341e+07,1.034477e+07,7.684878e+06,5.413285e+06,,,
3,2020-01-20/2020-01-26,United States of America,0.0,0.0,0.000000,341338766.0,4.218540e+07,4.375102e+07,4.787080e+07,4.560237e+07,4.260209e+07,4.537366e+07,3.840401e+07,2.250539e+07,1.301387e+07,77.871999,2544.6,0.0
4,2020-01-27/2020-02-02,Germany,10.0,0.0,0.000000,82786787.0,7.539514e+06,7.725134e+06,9.713905e+06,1.046693e+07,1.039496e+07,1.350341e+07,1.034477e+07,7.684878e+06,5.413285e+06,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
499,2022-08-01/2022-08-07,United States of America,157249.0,489.0,0.000922,341338766.0,4.218540e+07,4.375102e+07,4.787080e+07,4.560237e+07,4.260209e+07,4.537366e+07,3.840401e+07,2.250539e+07,1.301387e+07,77.871999,1810.0,-168261.0
500,2022-08-08/2022-08-14,Italy,169974.0,0.0,0.000000,55443101.0,4.633566e+06,5.231560e+06,5.623413e+06,6.471216e+06,8.507701e+06,8.610615e+06,6.764526e+06,5.533564e+06,4.066940e+06,,,
501,2022-08-08/2022-08-14,United States of America,126631.0,502.0,0.001763,341338766.0,4.218540e+07,4.375102e+07,4.787080e+07,4.560237e+07,4.260209e+07,4.537366e+07,3.840401e+07,2.250539e+07,1.301387e+07,77.871999,1598.0,324776.0
502,2022-08-15/2022-08-21,Italy,137729.0,0.0,0.000000,55443101.0,4.633566e+06,5.231560e+06,5.623413e+06,6.471216e+06,8.507701e+06,8.610615e+06,6.764526e+06,5.533564e+06,4.066940e+06,,,


## **File Exportation**

In [30]:
os.getcwd()

'c:\\Users\\HP\\Documents\\MBD\\Courses\\Python for Data Analysis\\PDAI\\Group 4'

In [31]:
def export(tables:pd.DataFrame,directory:str):
    current=os.getcwd()
    if os.path.exists(directory):
        path1=os.path.join(directory,"macrotable.csv")
        tables.to_csv(path1,index=False)
    else:
        path2=os.path.join(current,"macrotable.csv")
        tables.to_csv(path2, index=False)


In [32]:
export(table,"data/")