# Task
Import the CSV file "Test.csv" from Google Drive and write each row into a separate JSON file.

## Mount google drive

### Subtask:
Mount your Google Drive to access the CSV file.


**Reasoning**:
Mount Google Drive to access the CSV file.



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Load csv

### Subtask:
Read the CSV file named "Test.csv" into a pandas DataFrame.


**Reasoning**:
Import pandas and read the CSV file from Google Drive into a DataFrame, then display the head and info of the DataFrame.



In [None]:
import pandas as pd
import os

file_path = '/content/drive/Shareddrives/AmFamSharedDrive/UW_Madison_DSHB_AmFam_Capstone/02_Data/persona/minmax.csv' # Corrected path

if os.path.exists(file_path):
    try:
        df = pd.read_csv(file_path)
        display(df.head())
        display(df.info())
    except Exception as e:
        print(f"Error reading the CSV file: {e}")
else:
    print(f"File not found at {file_path}")

Unnamed: 0,county,mode,age,income,home_value,wind_avg,storm_count,education,preparedness,risk_response,kids,employment,status,flood_zone,elevation_diff,flood_depth,construction_year,ownership
0,Duval,base,50,33000,290500,70.4,133,"Some college, no degree",Individuals who are Prepared,Likely,1.0,employed,Single,X,,0.0,1986,own
1,Duval,max,95,1258000,1327500,160.0,133,Bachelor's degree,Very likely,Very likely,2.0,employed,Married with spouse present,X,,0.0,1986,own
2,Duval,min,18,4,199075,25.0,133,"Some college, no degree",Individuals who are Prepared,Likely,0.0,unemployed,Single,X,,0.0,1986,rent
3,Duval,random,84,30000,249948,70.0,133,Bachelor's degree,Individuals who are Prepared,Likely,0.0,employed,Single,AE,,,1997,rent
4,Duval,random,52,53000,348819,90.0,133,Associate's degree,Individuals who are Prepared,Very likely,2.0,employed,Married with spouse present,AE,,,1997,own


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 18 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   county             80 non-null     object 
 1   mode               80 non-null     object 
 2   age                80 non-null     int64  
 3   income             80 non-null     int64  
 4   home_value         80 non-null     int64  
 5   wind_avg           80 non-null     float64
 6   storm_count        80 non-null     int64  
 7   education          80 non-null     object 
 8   preparedness       80 non-null     object 
 9   risk_response      79 non-null     object 
 10  kids               80 non-null     float64
 11  employment         80 non-null     object 
 12  status             80 non-null     object 
 13  flood_zone         80 non-null     object 
 14  elevation_diff     43 non-null     float64
 15  flood_depth        67 non-null     float64
 16  construction_year  80 non-nu

None

changing income values of 1 or 4 to be 0

In [None]:
df['income'] = df['income'].replace([1, 4], 0)

In [None]:
df['income'].value_counts()

Unnamed: 0_level_0,count
income,Unnamed: 1_level_1
0,16
33000,3
35000,3
30000,2
95000,2
53000,2
1479000,2
63000,2
50000,2
1471000,2


## Convert to JSON

### Subtask:
Iterate through each row of the DataFrame and save it as a separate JSON file.

**Reasoning**:
Iterate through each row of the DataFrame and save each row as a JSON file in a specified output directory.

In [None]:
import json
import os

output_dir = '/content/drive/Shareddrives/AmFamSharedDrive/UW_Madison_DSHB_AmFam_Capstone/02_Data/persona/json'
os.makedirs(output_dir, exist_ok=True)
print(f"Output directory created or verified: '{output_dir}'")

print("\nStarting file generation...")
count = 0
for index, row in df.iterrows():

    county_value = str(row['county']).strip().replace(' ', '_').replace('.', '_')
    mode_value = str(row['mode']).strip().replace(' ', '_').replace('.', '')

    json_filename = os.path.join(
        output_dir,
        f'persona_row_{index}_{county_value}_{mode_value}.json')

    row.to_json(json_filename, indent=4)
    count += 1
    print(f"  -> Saved: {os.path.basename(json_filename)}")

print(f"\nSuccessfully saved {count} rows as separate JSON files in '{output_dir}'")

Output directory created or verified: '/content/drive/Shareddrives/AmFamSharedDrive/UW_Madison_DSHB_AmFam_Capstone/02_Data/persona/json'

Starting file generation...
  -> Saved: persona_row_0_Duval_base.json
  -> Saved: persona_row_1_Duval_max.json
  -> Saved: persona_row_2_Duval_min.json
  -> Saved: persona_row_3_Duval_random.json
  -> Saved: persona_row_4_Duval_random.json
  -> Saved: persona_row_5_Broward_base.json
  -> Saved: persona_row_6_Broward_max.json
  -> Saved: persona_row_7_Broward_min.json
  -> Saved: persona_row_8_Broward_random.json
  -> Saved: persona_row_9_Broward_random.json
  -> Saved: persona_row_10_Alachua_base.json
  -> Saved: persona_row_11_Alachua_max.json
  -> Saved: persona_row_12_Alachua_min.json
  -> Saved: persona_row_13_Alachua_random.json
  -> Saved: persona_row_14_Alachua_random.json
  -> Saved: persona_row_15_St__Johns_base.json
  -> Saved: persona_row_16_St__Johns_max.json
  -> Saved: persona_row_17_St__Johns_min.json
  -> Saved: persona_row_18_St__Joh

## Load csv

### Subtask:
Retry loading the CSV file named "Test.csv" into a pandas DataFrame, considering the previous failure where the file was not found.


**Reasoning**:
Retry loading the CSV file using the specified path and check for its existence before attempting to read it into a DataFrame.

