![title banner](../banners/start_banner.png)

# Topic : Loading Data

This notebook provides sample code to load data of the following types using pandas

- .csv
- .tsv
- .json

Further Reference : https://pandas.pydata.org/pandas-docs/stable/reference/io.html

In [1]:
import pandas as pd
import json

In [2]:
# csv
df1 = pd.read_csv('../test_datasets/mental_health_survey.csv')
df1.head(3)

Unnamed: 0,Indicator,Group,State,Subgroup,Phase,Time Period,Time Period Label,Time Period Start Date,Time Period End Date,Value,LowCI,HighCI,Confidence Interval,Quartile Range,Suppression Flag
0,Took Prescription Medication for Mental Health...,National Estimate,United States,United States,2,13,Aug 19 - Aug 31,08/19/2020 12:00:00 AM,08/31/2020 12:00:00 AM,19.4,19.0,19.8,19.0 - 19.8,,
1,Took Prescription Medication for Mental Health...,By Age,United States,18 - 29 years,2,13,Aug 19 - Aug 31,08/19/2020 12:00:00 AM,08/31/2020 12:00:00 AM,18.7,17.2,20.3,17.2 - 20.3,,
2,Took Prescription Medication for Mental Health...,By Age,United States,30 - 39 years,2,13,Aug 19 - Aug 31,08/19/2020 12:00:00 AM,08/31/2020 12:00:00 AM,18.3,17.3,19.2,17.3 - 19.2,,


In [3]:
# tsv
df2 = pd.read_csv('../test_datasets/mental_health_survey.tsv', sep='\t')
df2.head(3)

Unnamed: 0,Indicator,Group,State,Subgroup,Phase,Time Period,Time Period Label,Time Period Start Date,Time Period End Date,Value,LowCI,HighCI,Confidence Interval,Quartile Range,Suppression Flag
0,Took Prescription Medication for Mental Health...,National Estimate,United States,United States,2,13,Aug 19 - Aug 31,08/19/2020 12:00:00 AM,08/31/2020 12:00:00 AM,19.4,19.0,19.8,19.0 - 19.8,,
1,Took Prescription Medication for Mental Health...,By Age,United States,18 - 29 years,2,13,Aug 19 - Aug 31,08/19/2020 12:00:00 AM,08/31/2020 12:00:00 AM,18.7,17.2,20.3,17.2 - 20.3,,
2,Took Prescription Medication for Mental Health...,By Age,United States,30 - 39 years,2,13,Aug 19 - Aug 31,08/19/2020 12:00:00 AM,08/31/2020 12:00:00 AM,18.3,17.3,19.2,17.3 - 19.2,,


In [4]:
# json

# first load the json data
data = json.load(open('../test_datasets/mental_health_survey.json'))

# observe the json structure to understand where the data lies
necessary_data = data["data"]

# convert data as pandas dataframe
df3 = pd.DataFrame(necessary_data).iloc[:,8:]
df3.head()

Unnamed: 0,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22
0,Took Prescription Medication for Mental Health...,National Estimate,United States,United States,2,13,Aug 19 - Aug 31,2020-08-19T00:00:00,2020-08-31T00:00:00,19.4,19.0,19.8,19.0 - 19.8,,
1,Took Prescription Medication for Mental Health...,By Age,United States,18 - 29 years,2,13,Aug 19 - Aug 31,2020-08-19T00:00:00,2020-08-31T00:00:00,18.7,17.2,20.3,17.2 - 20.3,,
2,Took Prescription Medication for Mental Health...,By Age,United States,30 - 39 years,2,13,Aug 19 - Aug 31,2020-08-19T00:00:00,2020-08-31T00:00:00,18.3,17.3,19.2,17.3 - 19.2,,
3,Took Prescription Medication for Mental Health...,By Age,United States,40 - 49 years,2,13,Aug 19 - Aug 31,2020-08-19T00:00:00,2020-08-31T00:00:00,20.4,19.5,21.3,19.5 - 21.3,,
4,Took Prescription Medication for Mental Health...,By Age,United States,50 - 59 years,2,13,Aug 19 - Aug 31,2020-08-19T00:00:00,2020-08-31T00:00:00,21.2,20.2,22.2,20.2 - 22.2,,


![end banner](../banners/finish_banner.png)