# Exploring Data (Manipulation) of National Basketball Association (NBA) games from 2004 to 2020

## Context:
- The data is about NBA (National Basketball Association) games from 2004 season to Dec, 2020.
- Focusses on importing data and data manipulation techniques.
- Original source (https://www.kaggle.com/nathanlauga/nba-games).
- Dataset has been slightly modified 

## Dataset Description:

Two datasets (in two separate csv files):

  - **games**: each game from 2004 season to Dec 2020, including information about the two teams in each game, and some details like number of points, etc
  - **teams**: information about each team played in the games
  
Assume we want to study the game level data, but with detailed information about each team. We'll need to combine these two datasets together.

## Objective: 
   - Load/examine/subset/rename/change dtypes of columns for each individual dataset
   - Combine them into a single dataset, and export it
   - Explore the final dataset by subsetting or sorting

### 1. Import the libraries

In [1]:
import pandas as pd

### 2. Load the data in `games.csv` as a DataFrame called `games`

Save the csv file under the same directory as the notebook if not typing the full path.

In [2]:
games = pd.read_csv('games.csv')

### 3. Look at the first 5 rows of the DataFrame

In [3]:
games.head()

Unnamed: 0,GAME_DATE,GAME_ID,GAME_STATUS_TEXT,HOME_TEAM_ID,VISITOR_TEAM_ID,SEASON,TEAM_ID_home,POINTS_home,FG_PCT_home,FT_PCT_home,...,AST_home,REB_home,TEAM_ID_away,POINTS_away,FG_PCT_away,FT_PCT_away,FG3_PCT_away,AST_away,REB_away,HOME_TEAM_WINS
0,2020-12-19,12000047,Final,1610612753,1610612766,2020,1610612753,120,0.433,0.792,...,23,50,1610612766,117,0.444,0.864,0.439,21,52,1
1,2020-12-19,12000048,Final,1610612764,1610612765,2020,1610612764,99,0.427,0.625,...,24,45,1610612765,96,0.402,0.647,0.326,18,51,1
2,2020-12-19,12000049,Final,1610612763,1610612737,2020,1610612763,116,0.4,0.744,...,21,43,1610612737,117,0.422,0.837,0.297,24,47,0
3,2020-12-18,12000039,Final,1610612754,1610612755,2020,1610612754,107,0.371,0.692,...,19,45,1610612755,113,0.533,0.629,0.355,23,48,0
4,2020-12-18,12000040,Final,1610612761,1610612748,2020,1610612761,105,0.38,0.737,...,27,37,1610612748,117,0.534,0.741,0.514,30,51,0


### 4. Look at the columns of the DataFrame

In [4]:
games.columns

Index(['GAME_DATE', 'GAME_ID', 'GAME_STATUS_TEXT', 'HOME_TEAM_ID',
       'VISITOR_TEAM_ID', 'SEASON', 'TEAM_ID_home', 'POINTS_home',
       'FG_PCT_home', 'FT_PCT_home', 'FG3_PCT_home', 'AST_home', 'REB_home',
       'TEAM_ID_away', 'POINTS_away', 'FG_PCT_away', 'FT_PCT_away',
       'FG3_PCT_away', 'AST_away', 'REB_away', 'HOME_TEAM_WINS'],
      dtype='object')

### 5. Reassign `games` as its subset of the columns 'GAME_DATE', 'GAME_STATUS_TEXT', 'HOME_TEAM_ID', 'TEAM_ID_away', 'POINTS_home', 'POINTS_away', 'HOME_TEAM_WINS'

Extract only some columns about the games

In [5]:
games= games[['GAME_DATE','GAME_STATUS_TEXT','HOME_TEAM_ID','TEAM_ID_away','POINTS_home','POINTS_away','HOME_TEAM_WINS']]

### 6. Look at the new `games` DataFrame's first 5 rows, and info summary

In [6]:
games.head()

Unnamed: 0,GAME_DATE,GAME_STATUS_TEXT,HOME_TEAM_ID,TEAM_ID_away,POINTS_home,POINTS_away,HOME_TEAM_WINS
0,2020-12-19,Final,1610612753,1610612766,120,117,1
1,2020-12-19,Final,1610612764,1610612765,99,96,1
2,2020-12-19,Final,1610612763,1610612737,116,117,0
3,2020-12-18,Final,1610612754,1610612755,107,113,0
4,2020-12-18,Final,1610612761,1610612748,105,117,0


In [7]:
games.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23421 entries, 0 to 23420
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   GAME_DATE         23421 non-null  object
 1   GAME_STATUS_TEXT  23421 non-null  object
 2   HOME_TEAM_ID      23421 non-null  int64 
 3   TEAM_ID_away      23421 non-null  int64 
 4   POINTS_home       23421 non-null  int64 
 5   POINTS_away       23421 non-null  int64 
 6   HOME_TEAM_WINS    23421 non-null  int64 
dtypes: int64(5), object(2)
memory usage: 1.3+ MB


### 7. Convert `GAME_DATE` to a `datetime` dtype

In [8]:
games['GAME_DATE'] = pd.to_datetime(games['GAME_DATE'])

### 8. Convert `GAME_STATUS_TEXT` to a `string` dtype

In [9]:
games['GAME_STATUS_TEXT'] = games['GAME_STATUS_TEXT'].astype('string')

### 9. Look at the head and info summary of the DataFrame to verify the changes

In [10]:
games.head()

Unnamed: 0,GAME_DATE,GAME_STATUS_TEXT,HOME_TEAM_ID,TEAM_ID_away,POINTS_home,POINTS_away,HOME_TEAM_WINS
0,2020-12-19,Final,1610612753,1610612766,120,117,1
1,2020-12-19,Final,1610612764,1610612765,99,96,1
2,2020-12-19,Final,1610612763,1610612737,116,117,0
3,2020-12-18,Final,1610612754,1610612755,107,113,0
4,2020-12-18,Final,1610612761,1610612748,105,117,0


In [11]:
games.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23421 entries, 0 to 23420
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   GAME_DATE         23421 non-null  datetime64[ns]
 1   GAME_STATUS_TEXT  23421 non-null  string        
 2   HOME_TEAM_ID      23421 non-null  int64         
 3   TEAM_ID_away      23421 non-null  int64         
 4   POINTS_home       23421 non-null  int64         
 5   POINTS_away       23421 non-null  int64         
 6   HOME_TEAM_WINS    23421 non-null  int64         
dtypes: datetime64[ns](1), int64(5), string(1)
memory usage: 1.3 MB


### 10. Load the data in `teams.csv` as a DataFrame called `teams`, and look at its first 5 rows, and its columns

In [12]:
teams = pd.read_csv('teams.csv')
teams.head()

Unnamed: 0,LEAGUE_ID,TEAM_ID,MIN_YEAR,MAX_YEAR,ABBREVIATION,NICKNAME,YEARFOUNDED,CITY,ARENA,ARENACAPACITY,OWNER,GENERALMANAGER,HEADCOACH,DLEAGUEAFFILIATION
0,0,1610612737,1949,2019,ATL,Hawks,1949,Atlanta,State Farm Arena,18729.0,Tony Ressler,Travis Schlenk,Lloyd Pierce,Erie Bayhawks
1,0,1610612738,1946,2019,BOS,Celtics,1946,Boston,TD Garden,18624.0,Wyc Grousbeck,Danny Ainge,Brad Stevens,Maine Red Claws
2,0,1610612740,2002,2019,NOP,Pelicans,2002,New Orleans,Smoothie King Center,,Tom Benson,Trajan Langdon,Alvin Gentry,No Affiliate
3,0,1610612741,1966,2019,CHI,Bulls,1966,Chicago,United Center,21711.0,Jerry Reinsdorf,Gar Forman,Jim Boylen,Windy City Bulls
4,0,1610612742,1980,2019,DAL,Mavericks,1980,Dallas,American Airlines Center,19200.0,Mark Cuban,Donnie Nelson,Rick Carlisle,Texas Legends


In [13]:
teams.columns

Index(['LEAGUE_ID', 'TEAM_ID', 'MIN_YEAR', 'MAX_YEAR', 'ABBREVIATION',
       'NICKNAME', 'YEARFOUNDED', 'CITY', 'ARENA', 'ARENACAPACITY', 'OWNER',
       'GENERALMANAGER', 'HEADCOACH', 'DLEAGUEAFFILIATION'],
      dtype='object')

### 11. Reassign `teams` as a subset of its columns 'TEAM_ID', 'CITY', 'NICKNAME', and look at its first 5 rows and info summary

Extract only some columns about the teams

In [14]:
teams = teams[['TEAM_ID','CITY','NICKNAME']]
teams.head()

Unnamed: 0,TEAM_ID,CITY,NICKNAME
0,1610612737,Atlanta,Hawks
1,1610612738,Boston,Celtics
2,1610612740,New Orleans,Pelicans
3,1610612741,Chicago,Bulls
4,1610612742,Dallas,Mavericks


In [15]:
teams.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   TEAM_ID   30 non-null     int64 
 1   CITY      30 non-null     object
 2   NICKNAME  30 non-null     object
dtypes: int64(1), object(2)
memory usage: 848.0+ bytes


### 12. Convert both columns `CITY` and `NICKNAME` to a `string` dtype

In [16]:
teams = teams.astype({'CITY': 'string', 'NICKNAME': 'string'})

### 13. Verify the changes with the `dtypes` attribute

In [17]:
teams.dtypes

TEAM_ID      int64
CITY        string
NICKNAME    string
dtype: object

### 14. Print out the first two rows of `games` and `teams`, how can we combine them?

In [18]:
games.head(2)

Unnamed: 0,GAME_DATE,GAME_STATUS_TEXT,HOME_TEAM_ID,TEAM_ID_away,POINTS_home,POINTS_away,HOME_TEAM_WINS
0,2020-12-19,Final,1610612753,1610612766,120,117,1
1,2020-12-19,Final,1610612764,1610612765,99,96,1


In [19]:
teams.head(2)

Unnamed: 0,TEAM_ID,CITY,NICKNAME
0,1610612737,Atlanta,Hawks
1,1610612738,Boston,Celtics


*Note:*

*Within the `games` DataFrame, there are two columns `HOME_TEAM_ID` and `TEAM_ID_away`. This is because each game involves two teams playing against each other. The team that played in its own location, is called the 'home' team, the team that played outside its location, is called the 'away' team. Each game has one 'home' team and one 'away' team.*

*While the `teams` DataFrame stores the information about each team, the identifier for each team is the column `TEAM_ID`.*

*We can merge the two DataFrames based on:*

- *`TEAM_ID_home` in `games` and `TEAM_ID` in `teams`: to get the team information for the 'home' team*
- *`TEAM_ID_away` in `games` and `TEAM_ID` in `teams`: to get the team information for the 'away' team*

### 15. Merge (inner) `games` and `teams` based on 'HOME_TEAM_ID' and 'TEAM_ID', call the merged DataFrame games_with_home_team

In [20]:
games_with_home_team = pd.merge(left=games,right=teams,how='inner',left_on='HOME_TEAM_ID',right_on='TEAM_ID')

### 16. Print out the first 5 rows of the new DataFrame

Since we used the column `HOME_TEAM_ID` when merging, the two columns `CITY` and `NICKNAME` are storing the city and nickname of the 'home' team. 
So let's rename them.

In [22]:
games_with_home_team.head()

Unnamed: 0,GAME_DATE,GAME_STATUS_TEXT,HOME_TEAM_ID,TEAM_ID_away,POINTS_home,POINTS_away,HOME_TEAM_WINS,TEAM_ID,CITY,NICKNAME
0,2020-12-19,Final,1610612753,1610612766,120,117,1,1610612753,Orlando,Magic
1,2020-12-17,Final,1610612753,1610612766,115,123,0,1610612753,Orlando,Magic
2,2020-08-24,Final,1610612753,1610612749,106,121,0,1610612753,Orlando,Magic
3,2020-08-22,Final,1610612753,1610612749,107,121,0,1610612753,Orlando,Magic
4,2020-08-13,Final,1610612753,1610612740,133,127,1,1610612753,Orlando,Magic


### 17. Rename the column `CITY` as 'city_home', `NICKNAME` as 'nickname_home'

In [23]:
games_with_home_team = games_with_home_team.rename(columns={'CITY':'city_home','NICKNAME':'nickname_home'})

### 18. Merge (inner) `games_with_home_team` and `teams` based on 'TEAM_ID_away' and 'TEAM_ID', call the merged DataFrame `games_with_both_teams`

In [24]:
games_with_both_teams = pd.merge(left=games,right=teams,how='inner',left_on='TEAM_ID_away',right_on='TEAM_ID')

### 19. Print out the first two rows of the new DataFrame

Since we used the column `TEAM_ID_away` when merging, the two columns `CITY` and `NICKNAME` are storing the city and nickname of the 'away' team. 
So let's rename them.

In [25]:
games_with_both_teams.head()

Unnamed: 0,GAME_DATE,GAME_STATUS_TEXT,HOME_TEAM_ID,TEAM_ID_away,POINTS_home,POINTS_away,HOME_TEAM_WINS,TEAM_ID,CITY,NICKNAME
0,2020-12-19,Final,1610612753,1610612766,120,117,1,1610612766,Charlotte,Hornets
1,2020-12-17,Final,1610612753,1610612766,115,123,0,1610612766,Charlotte,Hornets
2,2020-03-11,Final,1610612748,1610612766,98,109,0,1610612766,Charlotte,Hornets
3,2020-03-09,Final,1610612737,1610612766,143,138,1,1610612766,Charlotte,Hornets
4,2020-02-28,Final,1610612761,1610612766,96,99,0,1610612766,Charlotte,Hornets


### 20. Rename the column `CITY` as 'city_away', `NICKNAME` as 'nickname_away'

In [26]:
games_with_both_teams = games_with_both_teams.rename(columns={'CITY':'city_away','NICKNAME':'nickname_away'})

### 21. Print out the first 5 rows of the new DataFrame

In [28]:
games_with_both_teams.head()

Unnamed: 0,GAME_DATE,GAME_STATUS_TEXT,HOME_TEAM_ID,TEAM_ID_away,POINTS_home,POINTS_away,HOME_TEAM_WINS,TEAM_ID,city_away,nickname_away
0,2020-12-19,Final,1610612753,1610612766,120,117,1,1610612766,Charlotte,Hornets
1,2020-12-17,Final,1610612753,1610612766,115,123,0,1610612766,Charlotte,Hornets
2,2020-03-11,Final,1610612748,1610612766,98,109,0,1610612766,Charlotte,Hornets
3,2020-03-09,Final,1610612737,1610612766,143,138,1,1610612766,Charlotte,Hornets
4,2020-02-28,Final,1610612761,1610612766,96,99,0,1610612766,Charlotte,Hornets


### The team ID columns are not needed after the merge of the DataFrames. 

### The below code is provided to drop the columns 'HOME_TEAM_ID, 'TEAM_ID_away'  from `games_with_both_teams`. 

In [29]:
games_with_both_teams = games_with_both_teams.drop(columns=['HOME_TEAM_ID', 'TEAM_ID_away'])
games_with_both_teams.head()

Unnamed: 0,GAME_DATE,GAME_STATUS_TEXT,POINTS_home,POINTS_away,HOME_TEAM_WINS,TEAM_ID,city_away,nickname_away
0,2020-12-19,Final,120,117,1,1610612766,Charlotte,Hornets
1,2020-12-17,Final,115,123,0,1610612766,Charlotte,Hornets
2,2020-03-11,Final,98,109,0,1610612766,Charlotte,Hornets
3,2020-03-09,Final,143,138,1,1610612766,Charlotte,Hornets
4,2020-02-28,Final,96,99,0,1610612766,Charlotte,Hornets


### 22. Make a copy of `games_with_both_teams` and assign it as `games`

In [30]:
games = games_with_both_teams.copy()

### 23. Change the column names in `games` to all lowercase

In [31]:
games.columns = games.columns.str.lower()

### 24. Print out the columns of `games` to verify the changes

In [32]:
games.columns

Index(['game_date', 'game_status_text', 'points_home', 'points_away',
       'home_team_wins', 'team_id', 'city_away', 'nickname_away'],
      dtype='object')

### 25. Print out the columns of `games_with_both_teams` to verify that the original DataFrame wasn't impacted by the copy

In [33]:
games_with_both_teams.columns

Index(['GAME_DATE', 'GAME_STATUS_TEXT', 'POINTS_home', 'POINTS_away',
       'HOME_TEAM_WINS', 'TEAM_ID', 'city_away', 'nickname_away'],
      dtype='object')

### 26. Check the dimensionality  of `games`

In [34]:
games.shape

(23421, 8)

### 27. Export `games` as a csv file called 'games_transformed.csv', and open the csv file to look at it

Feel free to test the difference of the csv files with or without the argument `index=False`

In [35]:
games.to_csv('games_transformed.csv',index=False)

### 28. Select all the columns of 'number' dtypes from `games`

In [36]:
games.select_dtypes(include='number')

Unnamed: 0,points_home,points_away,home_team_wins,team_id
0,120,117,1,1610612766
1,115,123,0,1610612766
2,98,109,0,1610612766
3,143,138,1,1610612766
4,96,99,0,1610612766
...,...,...,...,...
23416,93,104,0,1610612743
23417,104,101,1,1610612743
23418,110,90,1,1610612743
23419,97,89,1,1610612743


### 29. Select all the columns NOT of 'number' dtypes from `games`

In [37]:
games.select_dtypes(exclude='number')

Unnamed: 0,game_date,game_status_text,city_away,nickname_away
0,2020-12-19,Final,Charlotte,Hornets
1,2020-12-17,Final,Charlotte,Hornets
2,2020-03-11,Final,Charlotte,Hornets
3,2020-03-09,Final,Charlotte,Hornets
4,2020-02-28,Final,Charlotte,Hornets
...,...,...,...,...
23416,2014-10-18,Final,Denver,Nuggets
23417,2014-10-16,Final,Denver,Nuggets
23418,2014-10-13,Final,Denver,Nuggets
23419,2014-10-10,Final,Denver,Nuggets


### 30. Print out the first 5 rows of `games` as a reference

In [38]:
games.head()

Unnamed: 0,game_date,game_status_text,points_home,points_away,home_team_wins,team_id,city_away,nickname_away
0,2020-12-19,Final,120,117,1,1610612766,Charlotte,Hornets
1,2020-12-17,Final,115,123,0,1610612766,Charlotte,Hornets
2,2020-03-11,Final,98,109,0,1610612766,Charlotte,Hornets
3,2020-03-09,Final,143,138,1,1610612766,Charlotte,Hornets
4,2020-02-28,Final,96,99,0,1610612766,Charlotte,Hornets


### 31. Select the row with label 0

In [39]:
games.loc[0]

game_date           2020-12-19 00:00:00
game_status_text                  Final
points_home                         120
points_away                         117
home_team_wins                        1
team_id                      1610612766
city_away                     Charlotte
nickname_away                   Hornets
Name: 0, dtype: object

### 32. Select the row with integer position 0

In [40]:
games.iloc[0]

game_date           2020-12-19 00:00:00
game_status_text                  Final
points_home                         120
points_away                         117
home_team_wins                        1
team_id                      1610612766
city_away                     Charlotte
nickname_away                   Hornets
Name: 0, dtype: object

### 33. Set the column `game_date` as the index of DataFrame `games`

In [41]:
games=games.set_index('game_date')

### 34. Print out the index of `games` to verify the changes

In [42]:
games.index

DatetimeIndex(['2020-12-19', '2020-12-17', '2020-03-11', '2020-03-09',
               '2020-02-28', '2020-02-25', '2020-02-20', '2020-02-12',
               '2020-02-10', '2020-02-04',
               ...
               '2014-11-14', '2014-11-09', '2014-11-05', '2014-11-01',
               '2014-10-24', '2014-10-18', '2014-10-16', '2014-10-13',
               '2014-10-10', '2014-10-06'],
              dtype='datetime64[ns]', name='game_date', length=23421, freq=None)

### 35. Select the rows with label '2020-12-18'

In [43]:
games.loc['2020-12-18']

Unnamed: 0_level_0,game_status_text,points_home,points_away,home_team_wins,team_id,city_away,nickname_away
game_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-12-18,Final,107,113,0,1610612755,Philadelphia,76ers
2020-12-18,Final,105,117,0,1610612748,Miami,Heat
2020-12-18,Final,119,83,1,1610612739,Cleveland,Cavaliers
2020-12-18,Final,89,113,0,1610612751,Brooklyn,Nets
2020-12-18,Final,127,113,1,1610612749,Milwaukee,Bucks
2020-12-18,Final,103,105,0,1610612741,Chicago,Bulls
2020-12-18,Final,129,96,1,1610612757,Portland,Trail Blazers
2020-12-18,Final,113,114,0,1610612747,Los Angeles,Lakers


### 36. Select the rows with labels from '2020-12-18' to '2020-12-19'

In [44]:
games.loc['2012-12-18':'2020-12-19']

Unnamed: 0_level_0,game_status_text,points_home,points_away,home_team_wins,team_id,city_away,nickname_away
game_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-12-19,Final,120,117,1,1610612766,Charlotte,Hornets
2020-12-17,Final,115,123,0,1610612766,Charlotte,Hornets
2020-03-11,Final,98,109,0,1610612766,Charlotte,Hornets
2020-03-09,Final,143,138,1,1610612766,Charlotte,Hornets
2020-02-28,Final,96,99,0,1610612766,Charlotte,Hornets
...,...,...,...,...,...,...,...
2014-10-18,Final,93,104,0,1610612743,Denver,Nuggets
2014-10-16,Final,104,101,1,1610612743,Denver,Nuggets
2014-10-13,Final,110,90,1,1610612743,Denver,Nuggets
2014-10-10,Final,97,89,1,1610612743,Denver,Nuggets


### 37. Select the rows with labels of '2020-12-18' and '2019-12-18'

In [45]:
games.loc['2019-12-18']

Unnamed: 0_level_0,game_status_text,points_home,points_away,home_team_wins,team_id,city_away,nickname_away
game_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-12-18,Final,100,98,1,1610612766,Charlotte,Hornets
2019-12-18,Final,104,108,0,1610612748,Miami,Heat
2019-12-18,Final,109,110,0,1610612741,Chicago,Bulls
2019-12-18,Final,122,112,1,1610612744,Golden State,Warriors
2019-12-18,Final,103,109,0,1610612738,Boston,Celtics
2019-12-18,Final,99,112,0,1610612761,Toronto,Raptors
2019-12-18,Final,99,107,0,1610612740,New Orleans,Pelicans
2019-12-18,Final,126,122,1,1610612763,Memphis,Grizzlies
2019-12-18,Final,113,104,1,1610612753,Orlando,Magic


### 38. Select the rows with `points_home` greater than 150

In [46]:
games[games['points_home'] > 150]

Unnamed: 0_level_0,game_status_text,points_home,points_away,home_team_wins,team_id,city_away,nickname_away
game_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2006-12-27,Final,151,145,1,1610612765,Detroit,Pistons
2019-11-30,Final,158,111,1,1610612737,Atlanta,Hawks
2019-03-01,Final,161,168,0,1610612741,Chicago,Bulls
2010-03-16,Final,152,114,1,1610612750,Minnesota,Timberwolves
2020-08-25,Final,154,111,1,1610612742,Dallas,Mavericks
2006-12-07,Final,157,161,0,1610612756,Phoenix,Suns
2020-01-28,Final,151,131,1,1610612764,Washington,Wizards
2020-01-26,Final,152,133,1,1610612764,Washington,Wizards
2019-10-30,Final,158,159,0,1610612745,Houston,Rockets
2008-03-16,Final,168,116,1,1610612760,Oklahoma City,Thunder


### 39. Select the rows with `points_home` greater than 150, and `home_team_wins` not being 1

In [47]:
games[(games['points_home'] > 150) & (games['home_team_wins'] != 1)]

Unnamed: 0_level_0,game_status_text,points_home,points_away,home_team_wins,team_id,city_away,nickname_away
game_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-03-01,Final,161,168,0,1610612741,Chicago,Bulls
2006-12-07,Final,157,161,0,1610612756,Phoenix,Suns
2019-10-30,Final,158,159,0,1610612745,Houston,Rockets


### 40. Select the rows with `points_home` greater than 150, and `home_team_wins` not being 1, as well as the columns `home_team_wins` and `points_home`

In [48]:
games.loc[(games['points_home'] > 150) & (games['home_team_wins'] != 1), ['points_home', 'home_team_wins']]

Unnamed: 0_level_0,points_home,home_team_wins
game_date,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-03-01,161,0
2006-12-07,157,0
2019-10-30,158,0


### 41. Reset the index of `games` back to default and verify the changes

In [49]:
games=games.reset_index()
games.head()

Unnamed: 0,game_date,game_status_text,points_home,points_away,home_team_wins,team_id,city_away,nickname_away
0,2020-12-19,Final,120,117,1,1610612766,Charlotte,Hornets
1,2020-12-17,Final,115,123,0,1610612766,Charlotte,Hornets
2,2020-03-11,Final,98,109,0,1610612766,Charlotte,Hornets
3,2020-03-09,Final,143,138,1,1610612766,Charlotte,Hornets
4,2020-02-28,Final,96,99,0,1610612766,Charlotte,Hornets


### 42. Add a new column called `points_total`, as the sum of columns `points_home` and `points_away`

In [50]:
games['points_total'] = games['points_home'] + games['points_away']

### 43. Verify the changes by printing out the three columns `points_home`, `points_away`, `points_total`

In [51]:
games[['points_home','points_away','points_total']]

Unnamed: 0,points_home,points_away,points_total
0,120,117,237
1,115,123,238
2,98,109,207
3,143,138,281
4,96,99,195
...,...,...,...
23416,93,104,197
23417,104,101,205
23418,110,90,200
23419,97,89,186


### 44. Print out the 3 rows with the largest `points_total` using the `nlargest` method

In [52]:
games.nlargest(n=3,columns='points_total')

Unnamed: 0,game_date,game_status_text,points_home,points_away,home_team_wins,team_id,city_away,nickname_away,points_total
6758,2019-03-01,Final,161,168,0,1610612741,Chicago,Bulls,329
16809,2006-12-07,Final,157,161,0,1610612756,Phoenix,Suns,318
20043,2019-10-30,Final,158,159,0,1610612745,Houston,Rockets,317


### 45. Sort the DataFrame `games` by its `points_total` column in ascending order

Don't forget to reassign the sorted result back to `games`

In [53]:
games=games.sort_values(by='points_total',ascending=True)

### 46. Print out the last 3 rows of the sorted DataFrame

Verify that it's the same three rows as the previous example (`nlargest`)

In [54]:
games.tail(3)

Unnamed: 0,game_date,game_status_text,points_home,points_away,home_team_wins,team_id,city_away,nickname_away,points_total
20043,2019-10-30,Final,158,159,0,1610612745,Houston,Rockets,317
16809,2006-12-07,Final,157,161,0,1610612756,Phoenix,Suns,318
6758,2019-03-01,Final,161,168,0,1610612741,Chicago,Bulls,329


### 47. Given that the DataFrame is sorted by `points_total`, select the row with the smallest `points_total` using `iloc`

In [55]:
games.iloc[0]

game_date           2007-10-19 00:00:00
game_status_text                  Final
points_home                          36
points_away                          33
home_team_wins                        1
team_id                      1610612751
city_away                      Brooklyn
nickname_away                      Nets
points_total                         69
Name: 4999, dtype: object

### 48. Select the rows with the second, and third smallest `points_total` using `iloc`

In [56]:
games.iloc[[1,2]]

Unnamed: 0,game_date,game_status_text,points_home,points_away,home_team_wins,team_id,city_away,nickname_away,points_total
6735,2003-10-08,Final,62,58,1,1610612741,Chicago,Bulls,120
7451,2004-11-09,Final,64,60,1,1610612757,Portland,Trail Blazers,124


### 49. Select a subset including the first 4 rows, and the first 5 columns using `iloc`

In [57]:
games.iloc[:4,:5]

Unnamed: 0,game_date,game_status_text,points_home,points_away,home_team_wins
4999,2007-10-19,Final,36,33,1
6735,2003-10-08,Final,62,58,1
7451,2004-11-09,Final,64,60,1
11362,2005-03-13,Final,64,62,1


### THANK YOU