## CES Data Preprocessing

CES dataset contains observations from a survey `Consumer Expenditure Survey` conducted by a Brazilian Institute in 1947.

The data is in the following format:

`transactionID`, `value`

**TransactionID** : Represents a single family.

Value can be `City`, `Income`, `Members in family` or `Product Items`

Eg of a single record:

|  transactionID | value  |
|---|---|
| 400071  |  City_Curitiba |
| 400071  |  Income_above_43  |
| 400071  |  Members_5  |
| 400071  |  banana  |
| 400071  |  black_beans  |
| 400071  |  potato  |


This script transforms the above data to the following format for further analysis.

|  transactionID | city  | income| members | description|
|---|---|---|---|---|
| 400071  |  Curitiba | above_43 | 5 | banana |
| 400071  |  Curitiba | above_43 | 5 | black_beans |
| 400071  |  Curitiba | above_43 | 5 | potato |


For more details, please refer the [`CES DATASET FOR ASSOCIATION RULES.pdf`](https://github.com/alokg1019/DataMining_Team1/blob/master/data/CES/CES%20DATASET%20FOR%20ASSOCIATION%20RULES.pdf) file in `data/CES` folder.

In [3]:
import csv
all_dict = {}

# Ensure correct Path to data file
# File: data/CES/ces_hybrid.csv
DATA_PATH = "/content/ces_hybrid.csv"
with open(DATA_PATH) as f:
    cf = csv.reader(f)
    for row in cf:
        if row[0] not in all_dict:
            all_dict[row[0]] = [row[0], row[1]]
        else:
            all_dict[row[0]].append(row[1])

# Enter output path        
OUTPUT_PATH = "preprocessed_ces_hybrid.csv"
with open(OUTPUT_PATH, mode='w') as csv_file:
    fieldnames = ['id', 'city', 'income', 'members', 'description']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

    writer.writeheader()
    
    for k,v in all_dict.items():
        temp_row = {'id': v[0], 'city': v[1].replace("City_", ""), 'income': v[2].replace("Income_", ""), 'members': v[3].replace("Members_", "")}
        for index, i in enumerate(v):
            if index > 3:
                temp_row['description'] = i
                writer.writerow(temp_row)
                
print("Done")

Done
