<h1>Reading and Cleaning Data</h1>
<p>In this section we will discuss how to import files into python and how to clean and modify these files. A key point to reading files is to remember that the directory you are working in matters.</p>

In [None]:
#calling a library

import pandas as pd

#Alias -> 
#you can call libraries as aliases using "as". This will allow you to simplify your code and the amount of typing that you
#need to do.

In [None]:
#Importing Datasets into python

bigfoot = pd.read_csv('bigfoot.csv')
%whos

In [None]:
#Viewing dataframes .head() . tail()

bigfoot.head() #default is 5

#bigfoot.tail(10) #change the number displayed

<h1>Cleaning</h1>

<h3>Why would you need to clean data</h3>
<ul>
    <li>Data in columns and rows are not ordered in the correct way</li>
    <li>Creating values or ignoring missing data</li>
    <li>Units are not correct or are wrong in some way</li>
    <li>Order of magnitude is off</li>
    <li>Outliers and skewing of the data</li>
    </ul>

In [None]:
#dropna

bigfoot_cleaned = bigfoot.dropna()
%whos

In [13]:
# replace null values

bigfoot.fillna(999, inplace = True)
bigfoot.head()

Unnamed: 0,observed,location_details,county,state,season,title,latitude,longitude,date,number,...,moon_phase,precip_intensity,precip_probability,precip_type,pressure,summary,uv_index,visibility,wind_bearing,wind_speed
0,I was canoeing on the Sipsey river in Alabama....,999,Winston County,Alabama,Summer,999,999.0,999.0,999,30680,...,999.0,999.0,999.0,999,999.0,999,999.0,999.0,999.0,999.0
1,Ed L. was salmon fishing with a companion in P...,East side of Prince William Sound,Valdez-Chitina-Whittier County,Alaska,Fall,999,999.0,999.0,999,1261,...,999.0,999.0,999.0,999,999.0,999,999.0,999.0,999.0,999.0
2,"While attending U.R.I in the Fall of 1974,I wo...","Great swamp area, Narragansett Indians",Washington County,Rhode Island,Fall,Report 6496: Bicycling student has night encou...,41.45,-71.5,1974-09-20,6496,...,0.16,0.0,0.0,999,1020.61,Foggy until afternoon.,4.0,2.75,198.0,6.92
3,"Hello, My name is Doug and though I am very re...",I would rather not have exact location (listin...,York County,Pennsylvania,Summer,999,999.0,999.0,999,8000,...,999.0,999.0,999.0,999,999.0,999,999.0,999.0,999.0,999.0
4,It was May 1984. Two friends and I were up in ...,"Logging roads north west of Yamhill, OR, about...",Yamhill County,Oregon,Spring,999,999.0,999.0,999,703,...,999.0,999.0,999.0,999,999.0,999,999.0,999.0,999.0,999.0


In [17]:
#filter

bigfoot.latitude = bigfoot.latitude.filter(items=999.00)


TypeError: 'float' object is not iterable

In [None]:
#Values

import pandas as pd

demo = pd.read_csv('demo.csv')
demo.columns

In [None]:
#Recoding

#values in a variable -> how to recode -> pandas documentation link
import pandas as pd

demo["gender"].value_counts() # what if they are not coded correctly

In [None]:
#Changing case values

#demo["gender"].str.lower()

demo["gender"] = demo["gender"].str.lower()

#demo["gender"] = demo["gender"].str.title()

demo["gender"].value_counts()

In [None]:
#recode

demo.loc[demo["gender"].str.contains("F"), "gender"] = "Female"
demo.loc[demo["gender"].str.contains("M"), "gender"] = "Male"
demo["gender"].value_counts()

In [None]:
#subset

gender = demo["gender"]
gender.head()

In [None]:
#Subset multiple

gender_income = demo[["gender", "income"]]
gender_income

In [None]:
#Select values

above_35 = demo[demo["income"] > 35]
above_35.mean()

In [None]:
#Sort

demo.sort_values(by="gender")


demo.sort_values(by=['gender', 'income'], ascending=False).head()

In [None]:
#Pivot Table

demo.pivot_table(
    values="age", index="income", columns="ed", aggfunc="mean"
)



In [None]:
#Write

demo.to_csv("demo_from_python.csv")