# Introduction to Pandas (Importing and Analyzing Data)

In this notebook, we will walk through some basic features of the Python pandas module.

In [7]:
import pandas as pd # pd is a nickname that the Python community uses for the library
# instead of typing "pandas.function()" you only have to type "pd.function()"


## Import data

In [13]:
# read in a file called weather.csv
# store it in a variable called weather
weather = pd.read_csv("weather.csv")
weather # notice that Jupyter formats these nicely, without having to use "print"

Unnamed: 0,city,date,time,temp,precip
0,College Town,1/1/2004,00:45,52,False
1,College Town,1/2/2004,00:45,47,True
2,New Greenstown,1/1/2004,01:00,72,True
3,New Greenstown,1/2/2004,01:00,75,False
4,College Town,1/3/2004,00:45,49,True


In [14]:
# the "head" function prints the top lines of the dataframe (default: 5)
weather.head(2) # in this case, we are printing the top 2 lines

Unnamed: 0,city,date,time,temp,precip
0,College Town,1/1/2004,00:45,52,False
1,College Town,1/2/2004,00:45,47,True


In [15]:
# "tail" prints the rows at the end of the dataframe
weather.tail(10)

Unnamed: 0,city,date,time,temp,precip
0,College Town,1/1/2004,00:45,52,False
1,College Town,1/2/2004,00:45,47,True
2,New Greenstown,1/1/2004,01:00,72,True
3,New Greenstown,1/2/2004,01:00,75,False
4,College Town,1/3/2004,00:45,49,True


## Learn more about the contents of "weather" 

In [16]:
# what is the data type of "weather"?
type(weather)

pandas.core.frame.DataFrame

In [17]:
# how many rows are in the dataframe?
len(weather)

5

In [19]:
# Get basic statistics on the contents of the dataframe
weather.describe()

Unnamed: 0,temp
count,5.0
mean,59.0
std,13.397761
min,47.0
25%,49.0
50%,52.0
75%,72.0
max,75.0


In [20]:
# How big is this data?
weather.memory_usage()

Index     80
city      40
date      40
time      40
temp      40
precip     5
dtype: int64

In [21]:
# datatypes in the dataframe
# note that you don't need parentheses
weather.dtypes

city      object
date      object
time      object
temp       int64
precip      bool
dtype: object

## Working with columns

In [22]:
# display the contents of a column
weather['temp']

0    52
1    47
2    72
3    75
4    49
Name: temp, dtype: int64

In [23]:
# another way to display contents of a column
weather.temp

0    52
1    47
2    72
3    75
4    49
Name: temp, dtype: int64

In [25]:
# add a column called "blank" with empty values
weather['blank'] = "" # this fills every row in the column with "" (blank string)

In [26]:
# add a new column that doubles the value in the temp column
weather['doubled_temp'] = weather['temp'] * 2
weather

Unnamed: 0,city,date,time,temp,precip,blank,doubled_temp
0,College Town,1/1/2004,00:45,52,False,,104
1,College Town,1/2/2004,00:45,47,True,,94
2,New Greenstown,1/1/2004,01:00,72,True,,144
3,New Greenstown,1/2/2004,01:00,75,False,,150
4,College Town,1/3/2004,00:45,49,True,,98


In [27]:
# print the first three letters of the strings in the "city" column
for city in weather['city']:
    print(city[:3])

Col
Col
New
New
Col


## Save the file as a new CSV

In [29]:
weather.to_csv("updated_weather.csv")