# Intro to Pandas

## Data Frames

#### Basic Imports

In [2]:
# Basic imports
import numpy as np
from pandas import Series, DataFrame
import pandas as pd

### Data Frame from dict

1. Create a Data Frame from dictionary below

In [3]:
# Setting the data for the df
data = {'Lisbon':[115,222,345,421], 'Rome':[231, 432, 489, 129], 'London':[654,543,632,233]}

In [6]:
# Create the df
df = DataFrame(data)
df


Unnamed: 0,Lisbon,Rome,London
0,115,231,654
1,222,432,543
2,345,489,632
3,421,129,233


### Data Frame  from numpy array

2. Use the numpy.randn() method and create a DF 5x5. Set the columns name = ('Z', 'Y', 'Z','W','T') 

In [11]:
# Data from array
data = np.random.randn(25).reshape(5,5)


In [13]:
# DF creation
np_df = DataFrame(data, columns = ['Z', 'Y', 'Z','W','T'])

# show
np_df

Unnamed: 0,Z,Y,Z.1,W,T
0,-0.521549,0.018261,0.17409,1.092777,-1.129938
1,-0.228555,-0.554758,2.169089,-0.360746,-1.56658
2,-0.992403,1.553021,-0.567884,-1.058811,1.030792
3,0.368701,-0.136186,-0.903002,-0.185166,1.517247
4,0.512276,-0.267212,-0.631269,0.241593,-1.99819


###  Data Frame from excel 

From now we will continue to work on the "premier_league.xlsx" file

In [15]:
premier_df = pd.read_excel('../../Data/premier_league.xlsx')
premier_df

Unnamed: 0,Pos,Team,Pld,W,D,L,GF,GA,GD,Pts
0,1,Manchester City,38,32,2,4,95,23,72,98
1,2,Liverpool,38,30,7,1,89,22,67,97
2,3,Chelsea,38,21,9,8,63,39,24,72
3,4,Tottenham Hotspur,38,23,2,13,67,39,28,71
4,5,Arsenal,38,21,7,10,73,51,22,70
5,6,Manchester United,38,19,9,10,65,54,11,66
6,7,Wolverhampton Wanderers,38,16,9,13,47,46,1,57
7,8,Everton,38,15,9,14,54,46,8,54
8,9,Leicester City,38,15,7,16,51,48,3,52
9,10,West Ham United,38,15,7,16,52,55,−3,52


#### Data Frame description:
* **Pos:** Rank position
* **Team:** Team Name
* **Pld:** Games Played
* **W:** Games Won
* **D:** Games Drawn
* **L:** Games Lost
* **GF:** Goals For
* **GA:** Goals Against
* **GD:** Goals Difference
* **Pts:** Points

## Basic DF information

3. Print the info about the structure of the DF

In [18]:
# Info on the DF structure
premier_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
Pos     10 non-null int64
Team    10 non-null object
Pld     10 non-null int64
W       10 non-null int64
D       10 non-null int64
L       10 non-null int64
GF      10 non-null int64
GA      10 non-null int64
GD      10 non-null object
Pts     10 non-null int64
dtypes: int64(8), object(2)
memory usage: 880.0+ bytes


#### Head and Tail
4. Print the first 4 rows of the df

In [19]:
# First 4 rows
premier_df.head()

Unnamed: 0,Pos,Team,Pld,W,D,L,GF,GA,GD,Pts
0,1,Manchester City,38,32,2,4,95,23,72,98
1,2,Liverpool,38,30,7,1,89,22,67,97
2,3,Chelsea,38,21,9,8,63,39,24,72
3,4,Tottenham Hotspur,38,23,2,13,67,39,28,71
4,5,Arsenal,38,21,7,10,73,51,22,70


5. Print the last 3 rows of the df

In [24]:
# Last 3 rows
premier_df.tail(3)

Unnamed: 0,Pos,Team,Pld,W,D,L,GF,GA,GD,Pts
7,8,Everton,38,15,9,14,54,46,8,54
8,9,Leicester City,38,15,7,16,51,48,3,52
9,10,West Ham United,38,15,7,16,52,55,−3,52


6. Print the df index

In [26]:
# DF index
premier_df.index

RangeIndex(start=0, stop=10, step=1)

7. Print the column names

In [29]:
# Column namp
for col in premier_df.columns: 
    print(col)

Pos
Team
Pld
W
D
L
GF
GA
GD
Pts


8. Print the column **'Pts'**, **[ ]** method.

In [42]:
# Column
premier_df['Pts']

0    98
1    97
2    72
3    71
4    70
5    66
6    57
7    54
8    52
9    52
Name: Pts, dtype: int64

9. Print the columns **'Team','Pts'**, **[ ]** method.

In [35]:
# Columns
premier_df[['Team','Pts']]

Unnamed: 0,Team,Pts
0,Manchester City,98
1,Liverpool,97
2,Chelsea,72
3,Tottenham Hotspur,71
4,Arsenal,70
5,Manchester United,66
6,Wolverhampton Wanderers,57
7,Everton,54
8,Leicester City,52
9,West Ham United,52


10. Print the column **'Pts'** from index 2 to 5

In [40]:
# Individual column combined with index
premier_df[2:6]['Pts']

2    72
3    71
4    70
5    66
Name: Pts, dtype: int64

11. Print the column **'Pts'** with the specific index 1 and 7

In [44]:
# Specific index
premier_df.iloc(1,7)

KeyError: "None of [Int64Index([1, 7], dtype='int64')] are in the [columns]"

#### Multiple data columns
12. Grab the columns **['Team','GF', 'GA','Pts']** using the following method

In [22]:
# Specific data columns
DataFrame()

Unnamed: 0,Team,Pld,GF,GA
0,Manchester City,38,95,23
1,Liverpool,38,89,22
2,Chelsea,38,63,39
3,Tottenham Hotspur,38,67,39
4,Arsenal,38,73,51
5,Manchester United,38,65,54
6,Wolverhampton Wanderers,38,47,46
7,Everton,38,54,46
8,Leicester City,38,51,48
9,West Ham United,38,52,55


13. Use the same method of Ex.12 and grab the same columns with index (2,4,6,8)

In [23]:
DataFrame()

Unnamed: 0,Team,Pts,W
2,Chelsea,72,21
4,Arsenal,70,21
7,Everton,54,15


### Grab rows 

14. Print the row with index 5

In [24]:
#rows indexing


15. Grab rows from index 0 to 5

In [25]:
# Slicing rows


16. Grab rows with index 3,6,8

In [26]:
# Specific index rows


### Adding new column
17. Use the premier_df and add a new column **"Coach"**. Assign to the new column the value **"Ferguson"**

In [27]:
# New column


In [29]:
premier_df

Unnamed: 0,Pos,Team,Pld,W,D,L,GF,GA,GD,Pts,Stadium
0,1,Manchester City,38,32,2,4,95,23,72,98,Wembley
1,2,Liverpool,38,30,7,1,89,22,67,97,Wembley
2,3,Chelsea,38,21,9,8,63,39,24,72,Wembley
3,4,Tottenham Hotspur,38,23,2,13,67,39,28,71,Wembley
4,5,Arsenal,38,21,7,10,73,51,22,70,Wembley
5,6,Manchester United,38,19,9,10,65,54,11,66,Wembley
6,7,Wolverhampton Wanderers,38,16,9,13,47,46,1,57,Wembley
7,8,Everton,38,15,9,14,54,46,8,54,Wembley
8,9,Leicester City,38,15,7,16,51,48,3,52,Wembley
9,10,West Ham United,38,15,7,16,52,55,−3,52,Wembley


#### Adding Series to Data Frame

18. Given the following list **coach_data**, create a series **"coach_name""**, when you create the series specify the index where you want the coach names to match.

In [45]:
coach_data =['Mourinho', 'Ferguson', 'Conte','Ranieri']

In [None]:
#Adding a Series to a DataFrame
coach_name = Series(data= ,index= )
coach_name

19. Add the coach_name series data to the 'coach' column

In [30]:
#Now input into the nfl DataFrame


#Show
premier_df

20. Delete the 'coach' column 

In [32]:
# Delete columns


premier_df

## Congratulation you completed all the exercises!