# 4.1 *Column* Operations  
Note: We will use a Python Dictionary for the first time in this Notebook.  I've added a [Python Dictionary Notebook](../01_PythonBasics/1.5_Dictionaries.ipynb) as a reference to use.
- [Selecting Columns](#Selecting-Columns)  
  - Selecting a Single Column  
  - Selecting Multiple Columns  


- [Creating new Dataframes based on Selected Columns from an Existing Dataframe](#Creating-new-Dataframes-based-on-Selected-Columns-from-an-Existing-Dataframe)  


- [Removing Columns](#Removing-Columns)  
- [Keeping Columns](#Keeping-Columns)
- [Renaming Columns](#Renaming-Columns)  
    - Renaming a Single Column 
    - Renaming Multiple Columns  
    - Renaming All Columns  
- [Reordering Columns](#Reordering-Columns)    
    

- [Sorting Columns](#Sorting-Columns)  
  - Sorting by One Column  
  - Sorting by Multiple Columns
    

- [References](#References)  


- **Data Files Required**  
  - Data_Olympics.csv   
  


In [1]:
import pandas as pd

In [2]:
#Read the csv file into a pandas dataframe
df = pd.read_csv('Data/Data_Olympics.csv')

#Display the first five records/rows in the dataframe
df.head()

Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42


# Selecting Columns

### Selecting a Single Column

In [3]:
# Select the Country column and display the first ten rows
df['Country'].head(10)

0    United States (USA)
1    Great Britain (GBR)
2            China (CHN)
3           Russia (RUS)
4          Germany (GER)
5            Japan (JPN)
6           France (FRA)
7      South Korea (KOR)
8            Italy (ITA)
9        Australia (AUS)
Name: Country, dtype: object

### Selecting Multiple Columns

In [4]:
# Display the first five rows of df
df.head()

Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42


In [5]:
# Create a List with the selected column names
selected_columns = ['Country', 'Rank', 'Total']

# Display just those df columns
df[selected_columns].head()

Unnamed: 0,Country,Rank,Total
0,United States (USA),1,121
1,Great Britain (GBR),2,67
2,China (CHN),3,70
3,Russia (RUS),4,55
4,Germany (GER),5,42


# Creating new Dataframes based on Selected Columns from an Existing Dataframe

In [6]:
# Create a new dataframe named df_gold that has the data from the Country and Gold columns of df
selected_columns = ['Country', 'Gold']
df_gold = df[selected_columns]

# Display first five rows
df_gold.head()

Unnamed: 0,Country,Gold
0,United States (USA),46
1,Great Britain (GBR),27
2,China (CHN),26
3,Russia (RUS),19
4,Germany (GER),17


# Removing Columns

### Removing a Single Column from a DataFrame

In [7]:
#Read the CSV file
df = pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv')
df.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [8]:
# Display the column names of df
df.columns

Index(['country', 'beer_servings', 'spirit_servings', 'wine_servings',
       'total_litres_of_pure_alcohol', 'continent'],
      dtype='object')

In [9]:
# Create a new df named df_no_spirits based on df with the spirit_servings column dropped 
# Note:  inplace=False
df_no_spirits = df.drop('spirit_servings', axis='columns', inplace=False)

df_no_spirits.columns

Index(['country', 'beer_servings', 'wine_servings',
       'total_litres_of_pure_alcohol', 'continent'],
      dtype='object')

In [10]:
# Display df_no_spirits (first five rows)
df_no_spirits.head()

Unnamed: 0,country,beer_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0.0,Asia
1,Albania,89,54,4.9,Europe
2,Algeria,25,14,0.7,Africa
3,Andorra,245,312,12.4,Europe
4,Angola,217,45,5.9,Africa


### Removing Multiple Columns from a DataFrame

In [11]:
#Read the CSV file
df = pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv')
df.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [12]:
# Drop beer servings and wine servings df 
# Note:  inplace=True (We are not creating a new dataframe
df.drop(['beer_servings', 'wine_servings'], axis='columns', inplace=True)

df.columns

Index(['country', 'spirit_servings', 'total_litres_of_pure_alcohol',
       'continent'],
      dtype='object')

In [13]:
df.head(2)

Unnamed: 0,country,spirit_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0.0,Asia
1,Albania,132,4.9,Europe


# Keeping Columns

In [14]:
# Read the CSV file
df = pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv')
df.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [15]:
# Create a List (columns_to_keep) with the columns we want to keep from df:
# country, beer_servings, spirit_servings and continent
columns_to_keep = ['country', 'beer_servings', 'spirit_servings', 'continent']

In [16]:
# Create a new df (df_new_columns) with just those columns
df_new_columns = df[columns_to_keep]

df_new_columns.head()

Unnamed: 0,country,beer_servings,spirit_servings,continent
0,Afghanistan,0,0,Asia
1,Albania,89,132,Europe
2,Algeria,25,0,Africa
3,Andorra,245,138,Europe
4,Angola,217,57,Africa


# Renaming Columns

### Renaming a Single Column

In [17]:
# Read the CSV file
df_drinking = pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv')
df_drinking.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [18]:
# Show column names
df_drinking.columns

Index(['country', 'beer_servings', 'spirit_servings', 'wine_servings',
       'total_litres_of_pure_alcohol', 'continent'],
      dtype='object')

In [19]:
# Rename the country Column to Country
# Note: Uses a Python Dictionary
df_drinking.rename(columns = {'country': 'Country'}, inplace=True)

df_drinking.head()

Unnamed: 0,Country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


### Renaming Multiple Columns

In [20]:
# Show column names for df_drinking
df_drinking.columns

Index(['Country', 'beer_servings', 'spirit_servings', 'wine_servings',
       'total_litres_of_pure_alcohol', 'continent'],
      dtype='object')

#### Rename columns into NEW Data Frame:  
- 'beer_servings' to 'Beer'  
-   'spirit_servings' to 'Spirits'
-   'wine_servings' to 'Wine'

In [21]:
# Create the new_columns Dictionary with the renamings in it
new_columns = {'beer_servings':'Beer', 
               'spirit_servings':'Spirits', 
               'wine_servings':'Wine'}

# Rename the columns in the df_drinking dataframe
df_drinking.rename(columns = new_columns, inplace=False)
df_drinking.head()

Unnamed: 0,Country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


### Renaming *All* Columns

In [22]:
# Reread csv file to start with a fresh dataframe
# Read the csv file into a pandas dataframe
df = pd.read_csv('Data\Data_Olympics.csv')

df.columns

Index(['Rank', 'Country', 'Gold', 'Silver', 'Bronze', 'Total'], dtype='object')

In [23]:
# Create list of new column names (in correct order):
new_column_names = ['rank', 'country', 'gold medals', 'silver medals', 'bronze medals', 'total number of medals']

# Change column names
df.columns = new_column_names
df.columns

Index(['rank', 'country', 'gold medals', 'silver medals', 'bronze medals',
       'total number of medals'],
      dtype='object')

In [24]:
df.head()

Unnamed: 0,rank,country,gold medals,silver medals,bronze medals,total number of medals
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42


# Reordering Columns

In [25]:
# Reread csv file to start with a fresh dataframe
# Read the csv file into a pandas dataframe
df = pd.read_csv('Data\Data_Olympics.csv')

df.columns

Index(['Rank', 'Country', 'Gold', 'Silver', 'Bronze', 'Total'], dtype='object')

In [26]:
df.head()

Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42


In [27]:
# Create a List containing the Reordered columns 
# Note:  This must include all the existing columns and use their correct column names!
# Reorder to: 'Country', 'Rank', 'Total', 'Gold', 'Silver', 'Bronze'

new_cols = ['Country', 'Rank', 'Total', 'Gold', 'Silver', 'Bronze']
new_cols

['Country', 'Rank', 'Total', 'Gold', 'Silver', 'Bronze']

In [28]:
df.head(2)

Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67


In [29]:
# Use the List to reorder the columns and display resulting dataframe
df = df[new_cols]
df.head()

Unnamed: 0,Country,Rank,Total,Gold,Silver,Bronze
0,United States (USA),1,121,46,37,38
1,Great Britain (GBR),2,67,27,23,17
2,China (CHN),3,70,26,18,26
3,Russia (RUS),4,55,19,17,19
4,Germany (GER),5,42,17,10,15


# Sorting Columns

In [30]:
# Reread csv file to start with a fresh dataframe
# Read the csv file into a pandas dataframe
df = pd.read_csv('Data\Data_Olympics.csv')
df.head(10)

Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42
5,6,Japan (JPN),12,8,21,41
6,7,France (FRA),10,18,14,42
7,8,South Korea (KOR),9,3,9,21
8,9,Italy (ITA),8,12,8,28
9,10,Australia (AUS),8,11,10,29


### Sorting by One Column 

In [31]:
# Sort (existing dataframe) by Number of Gold medals, Descending
# Note:  inplace
df.sort_values('Gold', inplace=True, ascending=False)
df.head()

Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42


### Sorting by Multiple Columns  

In [32]:
#Read the CSV file
df_countries = pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/drinks.csv')
df_countries.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [33]:
# Sort df_Countries by Continent and then Country
df_countries.sort_values(['continent', 'country'], inplace=True, ascending=[True, True])

df_countries.head(5)

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
2,Algeria,25,0,14,0.7,Africa
4,Angola,217,57,45,5.9,Africa
18,Benin,34,4,13,1.1,Africa
22,Botswana,173,35,35,5.4,Africa
26,Burkina Faso,25,7,7,4.3,Africa


In [34]:
# Create a new dataframe (df_continents) based on a sorted copy of df_countries
# Sort order: Continent and Country, both Ascending
df_continents = df_countries.sort_values(['continent', 'country'], inplace=False, ascending=[True, True])

df_continents.head(10)

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
2,Algeria,25,0,14,0.7,Africa
4,Angola,217,57,45,5.9,Africa
18,Benin,34,4,13,1.1,Africa
22,Botswana,173,35,35,5.4,Africa
26,Burkina Faso,25,7,7,4.3,Africa
27,Burundi,88,0,0,6.3,Africa
29,Cabo Verde,144,56,16,4.0,Africa
31,Cameroon,147,1,4,5.8,Africa
33,Central African Republic,17,2,1,1.8,Africa
34,Chad,15,1,1,0.4,Africa


# References  
- [Python Dictionaries](../01_PythonBasics/1.5_Dictionaries.ipynb)  


-  [How do I rename columns in a pandas DataFrame?](https://www.youtube.com/watch?v=0uBirYFhizE&index=5&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y)  

- [How do I remove columns from a pandas DataFrame?](https://www.youtube.com/watch?v=gnUKkS964WQ&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=6)


