## Creating a data frame & loading it using pandas

### 1) Using pandas to read CSV file

In [2]:
import pandas as pd
weather_df = pd.read_csv("7-2_weather_data.csv")
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


### 2) Using pandas to convert dictionary (with a nested list) into data frame

In [3]:
import pandas as pd
weather_data = {
        'day': ['1/1/2017', '1/2/2017', '1/3/2017', '1/4/2017', '1/5/2017', '1/6/2017'],
        'temperature': [32, 35, 28, 24, 32, 31],
        'windspeed': [6, 7, 2, 7, 4, 2],
        'event': ['Rain', 'Sunny', 'Snow', 'Snow', 'Rain', 'Sunny']
}

weather_df2 = pd.DataFrame(weather_data)
weather_df2

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


### 3) Data frame attributes
#### Here is how you can retrieve the number of rows and columns from the data frame:


In [8]:
weather_df.shape
weather_df2.shape

(6, 4)

In [11]:
rows, columns = weather_df.shape
rows

6

In [10]:
columns

4

### 4) If you only want to print a couple of rows from the data:

#### Option 1: Heads

In [15]:
weather_df.head(2)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny


#### Option 2: Tails

In [14]:
weather_df.tail(3)

Unnamed: 0,day,temperature,windspeed,event
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


#### Option 3: Index (Printing a section)

In [17]:
weather_df[1:4]

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow


#### If you only want to display data from specific column:

In [18]:
weather_df.day

0    1/1/2017
1    1/2/2017
2    1/3/2017
3    1/4/2017
4    1/5/2017
5    1/6/2017
Name: day, dtype: object

In [19]:
weather_df['windspeed']

0    6
1    7
2    2
3    7
4    4
5    2
Name: windspeed, dtype: int64

### 5) Basic data analysis

#### You can calculate basic stats like max, min, mean

In [21]:
weather_df['windspeed'].max()

7

#### You can calculate a range of stats for a single column

In [22]:
weather_df['windspeed'].describe()

count    6.000000
mean     4.666667
std      2.338090
min      2.000000
25%      2.500000
50%      5.000000
75%      6.750000
max      7.000000
Name: windspeed, dtype: float64

#### Or multiple columns!

In [24]:
weather_df[['windspeed', 'temperature']].describe()

Unnamed: 0,windspeed,temperature
count,6.0,6.0
mean,4.666667,30.333333
std,2.33809,3.829708
min,2.0,24.0
25%,2.5,28.75
50%,5.0,31.5
75%,6.75,32.0
max,7.0,35.0


#### conditional selections - based on SQL language?

In [4]:
weather_df[weather_df.temperature=>25]

SyntaxError: invalid syntax (<ipython-input-4-00b0c482eebc>, line 1)

In [16]:
weather_df[weather_df.temperature==weather_df.temperature(max)]

TypeError: 'Series' object is not callable

#### ^obviously those did not work!

In [19]:
weather_df[weather_df['temperature'] =>2]

SyntaxError: invalid syntax (<ipython-input-19-4db99bdc0ee3>, line 1)

solutions: it's all about synatx! make sure that when using comparison operators, that the = sign goes AFTER.

In [5]:
weather_df[weather_df.temperature>=25]

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [6]:
weather_df[weather_df['windspeed'] >=2]

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [9]:
weather_df[weather_df['temperature'] == weather_df['temperature'].max()]

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny


In [10]:
weather_df[weather_df['temperature'] == weather_df['temperature'].min()]

Unnamed: 0,day,temperature,windspeed,event
3,1/4/2017,24,7,Snow


#### moving on to indexes...basically, a way of sorting your data.
#### the default index is numbers, but you can reset the index to the values of a different column
#### and then use "loc" method to select rows based on the column data

In [52]:
weather_df.index
weather_df.index('temperature', inplace=True)

TypeError: 'Int64Index' object is not callable

In [53]:
weather_df

Unnamed: 0_level_0,event
temperature,Unnamed: 1_level_1
32,Rain
35,Sunny
28,Snow
24,Snow
32,Rain
31,Sunny


In [56]:
weather_df.loc[32]

Unnamed: 0_level_0,event
temperature,Unnamed: 1_level_1
32,Rain
32,Rain


#### if you want to reset the index:

In [59]:
weather_df.reset_index(inplace=True)

In [11]:
weather_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


#### sorting the data:
#### use the sort_values() method. You can take out a specific column and sort it, or sort the entire data set by a specific column. 
#### ascending= true sorts by ascending, ascending = false sorts by descending.

In [12]:
weather_df['temperature'].sort_values(ascending=True)

3    24
2    28
5    31
0    32
4    32
1    35
Name: temperature, dtype: int64

In [13]:
weather_df.sort_values(['temperature'], ascending=True)

Unnamed: 0,day,temperature,windspeed,event
3,1/4/2017,24,7,Snow
2,1/3/2017,28,2,Snow
5,1/6/2017,31,2,Sunny
0,1/1/2017,32,6,Rain
4,1/5/2017,32,4,Rain
1,1/2/2017,35,7,Sunny


In [14]:
weather_df.sort_values(['temperature'], ascending=False)

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny
0,1/1/2017,32,6,Rain
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
