### Pandas
Pandas is built on top of Numpy, enhances numpy with:    
1)data labels and descriptive indices   
2)robust handling of commomn data formats and missing data    
https://pandas.pydata.org/docs/user_guide/index.html

In [2]:
#import pandas package. use pd as the nick name for pandas 
import pandas as pd
import numpy as np

#### We can analyze data in pandas with:

1)Series  
2)DataFrames  

In [None]:
Data =[0,1,4,9,16]  # Numeric data 

# Creating series with default index values 
s = pd.Series(Data)     

#Scalar Data with default Index
s

In [None]:
s.index

In [None]:
# # Creating series with predefined index values and name 
Index=['a','b','c','d','e']
s=pd.Series(Data,Index)
s

In [None]:
s.index

In [None]:
# Views unique values and counts
Data =[0,1,4,9,16,4,4] 
s = pd.Series(Data)     
s.value_counts(dropna=False)

DataFrames:  
DataFrames is two-dimensional data structure defined in pandas which consists of rows and columns.     
pd.DataFrame(data, index, columns, dtype, copy)

In [None]:
data={'Name':['A','B','C','D'],
     'age':[27,24,22,32],
     'address':['PA','NY','PA','MA'],
     'Edu':['MS','MA','Ba','Phd']}
data

In [None]:
# Convert the dictionary into DataFrame  
df=pd.DataFrame(data)
df

In [None]:
df = pd.DataFrame(data,index=['a','b','c','d']) 
df

In [None]:
df.shape

In [None]:
#index by number(implicit value)
#The iloc attribute allows to use implicit value as index
df.iloc[0]

In [None]:
#slice by number(implicit value)
df.iloc[0:2]

In [None]:
df[0:2]

In [None]:
#use loc to make clear that you use explicit values as index.
df.loc[:'c']
#A slice object with labels: both the start and the stop are included.

In [None]:
df['Edu']

### Import and Export data

#### Import data
```
pd.read_csv(filename) # From a CSV file
pd.read_table(filename) # From a delimited text file (like TSV)
pd.read_excel(filename) # From an Excel file
pd.read_sql(query, connection_object) # Reads from a SQL table/database
pd.read_json(json_string) # Reads from a JSON formatted string, URL or file.
pd.read_html(url) # Parses an html URL, string or file and extracts tables to a list of dataframes
pd.read_clipboard() # Takes the contents of your clipboard and passes it to read_table()
pd.DataFrame(dict) # From a dict, keys for columns names, values for data as lists
pd.read_stata(filename) # From a stata file
```

In [None]:
un=pd.read_csv('/Users/huilin/Desktop/unemployment.csv')
un

### Explore Data
```
df.shape # Prints number of rows and columns in dataframe
df.head(n) # Prints first n rows of the DataFrame
df.tail(n) # Prints last n rows of the DataFrame
df.info() # Index, Datatype and Memory information
df.describe() # Summary statistics for numerical columns
df.describe() # Summary statistics for numerical columns
df.mean() # Returns the mean of all columns
df.corr() # Returns the correlation between columns in a DataFrame
df.count() # Returns the number of non-null values in each DataFrame column
df.max() # Returns the highest value in each column
df.min() # Returns the lowest value in each column
df.median() # Returns the median of each column
df.std() # Returns the standard deviation of each column
s.value_counts(dropna=False) # Views unique values and counts
```

In [None]:
un.shape()

In [None]:
un.head()
# show the very beginning of the frame

In [None]:
#input stata file data
children=pd.read_stata('/Users/huilin/desktop/children_merge.dta')
children

#### Export data:
```
df.to_csv(filename)   # Writes to a CSV file    
df.to_excel(filename)   # Writes to an Excel file    
df.to_sql(table_name, connection_object)    # Writes to a SQL table    
df.to_json(filename)    # Writes to a file in JSON format   
df.to_html(filename)    # Saves as an HTML table   
df.to_clipboard()     # Writes to the clipboard   
```

In [None]:
# export data as .csv to desktop
# df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv') //for windows use r""
# mac option copy path, pa
children.to_csv('/Users/huilin/Downloads/children_merge.csv')

####  Upload files from your local file system if you use google colab
```
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(  
      name=fn, length=len(uploaded[fn])))
      
``` 
####  Downloading files to your local file system if you use google colab
```
from google.colab import files

with open('file', 'w') as f:
  f.write('some content')

files.download('file')
```

### Adjust data

In [None]:
#Input data with different format
# https://pandas.pydata.org/pandas-docs/stable/reference/io.html
# df=pandas.read_stata('')
children = pd.read_stata('/Users/huilin/Desktop/children_merge.dta')


Data set –child_merge
idind_c: INDIVIDUAL ID   
idind_f: FATHER'S INDIVIDUAL ID   
idind_m: MOTHER'S INDIVIDUAL ID   
wave: SURVEY YEAR   
a11_c: Completed Years of Formal Ed. in Regular School   
a12_c: Highest Level of Education Attained   
age_c: Calculated Age in Years to 0 Decimal Points   
gender_c: GENDER  (gender=1 for male; gender=2 for female) 
health_c: CURRENT HEALTH STATUS (SELF-REPORT)    
hhinc_pc_c: Per capita household income   
wage_c: AVERAGE MONTHLY WAGE LAST YEAR   
late_ben: policy indicator    (late_ben=1 if this group is affected by policy; otherwise 0 )
BMI_c: body weight index

### index

In [None]:
children

In [None]:
children.dtypes

In [None]:
#change the column type
children.astype({'wave': 'int32'}).dtypes

In [None]:
#.set_index([,])
ch_wave=children.set_index(['wave'])  
ch_wave

In [None]:
ch_wave.loc[2011]

In [None]:
ch_wave.loc[2011].shape

In [None]:
#another way to set index 
ch_wave1=pd.read_stata('/Users/huilin/Desktop/children_merge.dta', index_col='wave')
ch_wave1

In [None]:
#changing an index
children = pd.read_stata('/Users/huilin/Desktop/children_merge.dta')
ch_gender_age=children.set_index(['gender_c', 'age_c']).sort_index()
ch_gender_age

In [None]:
#gender=1 for male; gender=2 for female
## hwo to get data for 2 year old boy 
ch_gender_age.loc[].loc[].describe()

In [None]:
# filter data for 10 years old children
children = pd.read_stata('/Users/huilin/Desktop/children_merge.dta')
children_10 = children[children['age_c'] == 10]

In [None]:
# filter data excluding 10 years old children
children = pd.read_stata('/Users/huilin/Desktop/children_merge.dta')
children_10_remainder = children[children['age_c'] != 10]
children_10_remainder

In [None]:
# sort by age directly
children.sort_values('age_c')

In [3]:
## Adjust Data
import pandas as pd

In [4]:
children = pd.read_stata('/Users/huilin/Desktop/children_merge.dta')
children

Unnamed: 0,t1_c,year_c,idind_c,idind_f,idind_m,wave,a11_c,a12_c,age_c,gender_c,health_c,BMI_c,hhinc_pc_c,wage_c,late_ben,nchild
0,11.0,2000.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,16.0,1.0,11.0,1.0,,21.484375,47160.000000,,1.0,1.0
1,11.0,1993.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,25.0,3.0,17.0,1.0,,20.549889,54000.000000,,1.0,1.0
2,11.0,1993.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,25.0,3.0,17.0,1.0,,23.671255,28020.000000,,1.0,1.0
3,11.0,2004.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,11.0,0.0,6.0,2.0,,13.227513,21266.666667,,1.0,1.0
4,11.0,2004.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,11.0,0.0,7.0,1.0,,17.361109,24001.666667,,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5528,55.0,2005.0,5.513040e+11,5.513040e+11,5.513040e+11,2011.0,12.0,0.0,6.0,2.0,,14.934183,2300.000000,,0.0,1.0
5529,55.0,2005.0,5.513040e+11,5.513040e+11,5.513040e+11,2015.0,13.0,,10.0,2.0,,,4233.333333,,0.0,1.0
5530,55.0,2001.0,5.513042e+11,5.513042e+11,5.513042e+11,2015.0,23.0,1.0,14.0,1.0,,,4642.500000,,0.0,1.0
5531,55.0,1998.0,5.513042e+11,5.513042e+11,5.513042e+11,2015.0,26.0,3.0,16.0,2.0,,,29183.333333,,0.0,1.0


In [5]:
#rename column, more meaningful
ch_rename=children.rename(columns={"year_c":"birth year","gender_c":"gender","age_c":"age", "hhinc_pc_c": "household income", "BMI_c": "BMI","late_ben":"policy"})
ch_rename

Unnamed: 0,t1_c,birth year,idind_c,idind_f,idind_m,wave,a11_c,a12_c,age,gender,health_c,BMI,household income,wage_c,policy,nchild
0,11.0,2000.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,16.0,1.0,11.0,1.0,,21.484375,47160.000000,,1.0,1.0
1,11.0,1993.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,25.0,3.0,17.0,1.0,,20.549889,54000.000000,,1.0,1.0
2,11.0,1993.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,25.0,3.0,17.0,1.0,,23.671255,28020.000000,,1.0,1.0
3,11.0,2004.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,11.0,0.0,6.0,2.0,,13.227513,21266.666667,,1.0,1.0
4,11.0,2004.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,11.0,0.0,7.0,1.0,,17.361109,24001.666667,,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5528,55.0,2005.0,5.513040e+11,5.513040e+11,5.513040e+11,2011.0,12.0,0.0,6.0,2.0,,14.934183,2300.000000,,0.0,1.0
5529,55.0,2005.0,5.513040e+11,5.513040e+11,5.513040e+11,2015.0,13.0,,10.0,2.0,,,4233.333333,,0.0,1.0
5530,55.0,2001.0,5.513042e+11,5.513042e+11,5.513042e+11,2015.0,23.0,1.0,14.0,1.0,,,4642.500000,,0.0,1.0
5531,55.0,1998.0,5.513042e+11,5.513042e+11,5.513042e+11,2015.0,26.0,3.0,16.0,2.0,,,29183.333333,,0.0,1.0


In [7]:
# add a new column 
#Natural log of the column is computed using log() function 
#and stored in a new column namely “log_value”  as shown below.
ch_add=ch_rename
ch_add['log_BMI'] = np.log(ch_add['BMI'])


In [8]:
#Question:
#how to get a new column, Income=household income level divided by 1000? 
ch_add['Income']=



Unnamed: 0,t1_c,birth year,idind_c,idind_f,idind_m,wave,a11_c,a12_c,age,gender,health_c,BMI,household income,wage_c,policy,nchild,log_BMI,Income
0,11.0,2000.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,16.0,1.0,11.0,1.0,,21.484375,47160.000000,,1.0,1.0,3.067326,47.160000
1,11.0,1993.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,25.0,3.0,17.0,1.0,,20.549889,54000.000000,,1.0,1.0,3.022856,54.000000
2,11.0,1993.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,25.0,3.0,17.0,1.0,,23.671255,28020.000000,,1.0,1.0,3.164261,28.020000
3,11.0,2004.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,11.0,0.0,6.0,2.0,,13.227513,21266.666667,,1.0,1.0,2.582299,21.266667
4,11.0,2004.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,11.0,0.0,7.0,1.0,,17.361109,24001.666667,,1.0,1.0,2.854233,24.001667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5528,55.0,2005.0,5.513040e+11,5.513040e+11,5.513040e+11,2011.0,12.0,0.0,6.0,2.0,,14.934183,2300.000000,,0.0,1.0,2.703653,2.300000
5529,55.0,2005.0,5.513040e+11,5.513040e+11,5.513040e+11,2015.0,13.0,,10.0,2.0,,,4233.333333,,0.0,1.0,,4.233333
5530,55.0,2001.0,5.513042e+11,5.513042e+11,5.513042e+11,2015.0,23.0,1.0,14.0,1.0,,,4642.500000,,0.0,1.0,,4.642500
5531,55.0,1998.0,5.513042e+11,5.513042e+11,5.513042e+11,2015.0,26.0,3.0,16.0,2.0,,,29183.333333,,0.0,1.0,,29.183333


In [9]:
#drop some column not useful
ch_drop_columns=ch_add.drop(['a11_c', 'a12_c','health_c','wage_c','nchild','household income'], axis=1)
ch_drop_columns

Unnamed: 0,t1_c,birth year,idind_c,idind_f,idind_m,wave,age,gender,BMI,policy,log_BMI,Income
0,11.0,2000.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,11.0,1.0,21.484375,1.0,3.067326,47.160000
1,11.0,1993.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,17.0,1.0,20.549889,1.0,3.022856,54.000000
2,11.0,1993.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,17.0,1.0,23.671255,1.0,3.164261,28.020000
3,11.0,2004.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,6.0,2.0,13.227513,1.0,2.582299,21.266667
4,11.0,2004.0,1.111010e+11,1.111010e+11,1.111010e+11,2011.0,7.0,1.0,17.361109,1.0,2.854233,24.001667
...,...,...,...,...,...,...,...,...,...,...,...,...
5528,55.0,2005.0,5.513040e+11,5.513040e+11,5.513040e+11,2011.0,6.0,2.0,14.934183,0.0,2.703653,2.300000
5529,55.0,2005.0,5.513040e+11,5.513040e+11,5.513040e+11,2015.0,10.0,2.0,,0.0,,4.233333
5530,55.0,2001.0,5.513042e+11,5.513042e+11,5.513042e+11,2015.0,14.0,1.0,,0.0,,4.642500
5531,55.0,1998.0,5.513042e+11,5.513042e+11,5.513042e+11,2015.0,16.0,2.0,,0.0,,29.183333


In [22]:
# drop age at 30
#1 define age as the index and sort
ch_index_age=


Unnamed: 0_level_0,t1_c,birth year,idind_c,idind_f,idind_m,wave,gender,BMI,policy,log_BMI,Income
age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0.0,23.0,1997.0,2.311040e+11,2.311040e+11,2.311040e+11,1997.0,2.0,15.980974,1.0,2.771399,4.166667
0.0,55.0,2015.0,5.513010e+11,5.513010e+11,5.513010e+11,2015.0,1.0,,0.0,,16.000000
0.0,41.0,1993.0,4.112020e+11,4.112020e+11,4.112020e+11,1993.0,2.0,,1.0,,1.450750
0.0,21.0,2011.0,2.111031e+11,2.111031e+11,2.111031e+11,2011.0,1.0,18.325615,1.0,2.908300,6.000000
0.0,32.0,2005.0,3.212011e+11,3.212011e+11,3.212011e+11,2006.0,2.0,,1.0,,1.000000
...,...,...,...,...,...,...,...,...,...,...,...
30.0,32.0,1979.0,3.212061e+11,3.212061e+11,3.212061e+11,2009.0,2.0,24.005487,0.0,3.178282,9.346020
30.0,32.0,1974.0,3.212061e+11,,3.212060e+11,2004.0,2.0,20.177948,0.0,3.004590,12.202167
30.0,43.0,1976.0,4.311011e+11,,4.311011e+11,2006.0,1.0,16.977144,0.0,2.831868,8.600000
30.0,37.0,1961.0,3.711040e+11,,3.711040e+11,1991.0,1.0,20.987488,0.0,3.043926,2.285665


In [23]:
#2. drop age group 30 



Unnamed: 0_level_0,t1_c,birth year,idind_c,idind_f,idind_m,wave,gender,BMI,policy,log_BMI,Income
age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0.0,23.0,1997.0,2.311040e+11,2.311040e+11,2.311040e+11,1997.0,2.0,15.980974,1.0,2.771399,4.166667
0.0,55.0,2015.0,5.513010e+11,5.513010e+11,5.513010e+11,2015.0,1.0,,0.0,,16.000000
0.0,41.0,1993.0,4.112020e+11,4.112020e+11,4.112020e+11,1993.0,2.0,,1.0,,1.450750
0.0,21.0,2011.0,2.111031e+11,2.111031e+11,2.111031e+11,2011.0,1.0,18.325615,1.0,2.908300,6.000000
0.0,32.0,2005.0,3.212011e+11,3.212011e+11,3.212011e+11,2006.0,2.0,,1.0,,1.000000
...,...,...,...,...,...,...,...,...,...,...,...
29.0,23.0,1981.0,2.311010e+11,2.311010e+11,2.311010e+11,2011.0,1.0,22.928888,0.0,3.132398,42.900000
29.0,32.0,1967.0,3.211030e+11,3.211030e+11,3.211030e+11,1997.0,1.0,28.383343,0.0,3.345803,3.216000
29.0,41.0,1979.0,4.112040e+11,,4.112040e+11,2009.0,1.0,,0.0,,0.285714
29.0,45.0,1986.0,4.511040e+11,4.511040e+11,4.511040e+11,2015.0,2.0,,0.0,,18.350000


In [25]:
#Get data for children 0-18
ch_age18=ch_index_age.loc[0:18]
ch_age18

Unnamed: 0_level_0,t1_c,birth year,idind_c,idind_f,idind_m,wave,gender,BMI,policy,log_BMI,Income
age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0.0,23.0,1997.0,2.311040e+11,2.311040e+11,2.311040e+11,1997.0,2.0,15.980974,1.0,2.771399,4.166667
0.0,55.0,2015.0,5.513010e+11,5.513010e+11,5.513010e+11,2015.0,1.0,,0.0,,16.000000
0.0,41.0,1993.0,4.112020e+11,4.112020e+11,4.112020e+11,1993.0,2.0,,1.0,,1.450750
0.0,21.0,2011.0,2.111031e+11,2.111031e+11,2.111031e+11,2011.0,1.0,18.325615,1.0,2.908300,6.000000
0.0,32.0,2005.0,3.212011e+11,3.212011e+11,3.212011e+11,2006.0,2.0,,1.0,,1.000000
...,...,...,...,...,...,...,...,...,...,...,...
18.0,45.0,1993.0,4.512021e+11,4.512021e+11,4.512021e+11,2011.0,2.0,18.730490,1.0,2.930153,2.200000
18.0,41.0,1993.0,4.112031e+11,4.112031e+11,4.112031e+11,2011.0,2.0,22.090977,1.0,3.095169,13.475000
18.0,42.0,1979.0,4.211010e+11,4.211010e+11,4.211010e+11,1997.0,1.0,23.164062,0.0,3.142602,4.011000
18.0,37.0,1986.0,3.712031e+11,3.712031e+11,3.712031e+11,2004.0,1.0,24.023809,0.0,3.179045,3.183333


In [27]:
#drop missing value
ch_missing=ch_age18.dropna(axis=0)  #drop observation, so choose axis=0
ch_missing

Unnamed: 0_level_0,t1_c,birth year,idind_c,idind_f,idind_m,wave,gender,BMI,policy,log_BMI,Income
age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0.0,23.0,1997.0,2.311040e+11,2.311040e+11,2.311040e+11,1997.0,2.0,15.980974,1.0,2.771399,4.166667
0.0,21.0,2011.0,2.111031e+11,2.111031e+11,2.111031e+11,2011.0,1.0,18.325615,1.0,2.908300,6.000000
0.0,32.0,2000.0,3.212030e+11,3.212030e+11,3.212030e+11,2000.0,1.0,17.853601,1.0,2.882205,3.040200
0.0,23.0,2011.0,2.311011e+11,2.311011e+11,2.311011e+11,2011.0,2.0,21.486303,1.0,3.067416,7.200000
0.0,21.0,2004.0,2.111031e+11,2.111031e+11,2.111031e+11,2004.0,2.0,17.949827,1.0,2.887580,3.036667
...,...,...,...,...,...,...,...,...,...,...,...
18.0,45.0,1993.0,4.512021e+11,4.512021e+11,4.512021e+11,2011.0,2.0,18.730490,1.0,2.930153,2.200000
18.0,41.0,1993.0,4.112031e+11,4.112031e+11,4.112031e+11,2011.0,2.0,22.090977,1.0,3.095169,13.475000
18.0,42.0,1979.0,4.211010e+11,4.211010e+11,4.211010e+11,1997.0,1.0,23.164062,0.0,3.142602,4.011000
18.0,37.0,1986.0,3.712031e+11,3.712031e+11,3.712031e+11,2004.0,1.0,24.023809,0.0,3.179045,3.183333


In [28]:
#reset index
child=ch_missing.reset_index()
child

Unnamed: 0,age,t1_c,birth year,idind_c,idind_f,idind_m,wave,gender,BMI,policy,log_BMI,Income
0,0.0,23.0,1997.0,2.311040e+11,2.311040e+11,2.311040e+11,1997.0,2.0,15.980974,1.0,2.771399,4.166667
1,0.0,21.0,2011.0,2.111031e+11,2.111031e+11,2.111031e+11,2011.0,1.0,18.325615,1.0,2.908300,6.000000
2,0.0,32.0,2000.0,3.212030e+11,3.212030e+11,3.212030e+11,2000.0,1.0,17.853601,1.0,2.882205,3.040200
3,0.0,23.0,2011.0,2.311011e+11,2.311011e+11,2.311011e+11,2011.0,2.0,21.486303,1.0,3.067416,7.200000
4,0.0,21.0,2004.0,2.111031e+11,2.111031e+11,2.111031e+11,2004.0,2.0,17.949827,1.0,2.887580,3.036667
...,...,...,...,...,...,...,...,...,...,...,...,...
2553,18.0,45.0,1993.0,4.512021e+11,4.512021e+11,4.512021e+11,2011.0,2.0,18.730490,1.0,2.930153,2.200000
2554,18.0,41.0,1993.0,4.112031e+11,4.112031e+11,4.112031e+11,2011.0,2.0,22.090977,1.0,3.095169,13.475000
2555,18.0,42.0,1979.0,4.211010e+11,4.211010e+11,4.211010e+11,1997.0,1.0,23.164062,0.0,3.142602,4.011000
2556,18.0,37.0,1986.0,3.712031e+11,3.712031e+11,3.712031e+11,2004.0,1.0,24.023809,0.0,3.179045,3.183333


In [29]:
#save this new data 
ch_missing.to_csv('/Users/huilin/desktop/child.csv')

In [30]:
# get the new data
child=pd.read_csv('/Users/huilin/desktop/child.csv')
child

Unnamed: 0,age,t1_c,birth year,idind_c,idind_f,idind_m,wave,gender,BMI,policy,log_BMI,Income
0,0.0,23.0,1997.0,2.311040e+11,2.311040e+11,2.311040e+11,1997.0,2.0,15.980974,1.0,2.771399,4.166667
1,0.0,21.0,2011.0,2.111031e+11,2.111031e+11,2.111031e+11,2011.0,1.0,18.325615,1.0,2.908300,6.000000
2,0.0,32.0,2000.0,3.212030e+11,3.212030e+11,3.212030e+11,2000.0,1.0,17.853601,1.0,2.882205,3.040200
3,0.0,23.0,2011.0,2.311011e+11,2.311011e+11,2.311011e+11,2011.0,2.0,21.486303,1.0,3.067416,7.200000
4,0.0,21.0,2004.0,2.111031e+11,2.111031e+11,2.111031e+11,2004.0,2.0,17.949827,1.0,2.887580,3.036667
...,...,...,...,...,...,...,...,...,...,...,...,...
2553,18.0,45.0,1993.0,4.512021e+11,4.512021e+11,4.512021e+11,2011.0,2.0,18.730490,1.0,2.930153,2.200000
2554,18.0,41.0,1993.0,4.112031e+11,4.112031e+11,4.112031e+11,2011.0,2.0,22.090977,1.0,3.095169,13.475000
2555,18.0,42.0,1979.0,4.211010e+11,4.211010e+11,4.211010e+11,1997.0,1.0,23.164062,0.0,3.142602,4.011000
2556,18.0,37.0,1986.0,3.712031e+11,3.712031e+11,3.712031e+11,2004.0,1.0,24.023810,0.0,3.179045,3.183333
