01-Return Statement I

Let's now drill into the return statement.

class DataShell:
    def __init__(self, x):
        return x
In the code chunk above, you may have expected to see the print() function instead of the return statement. The difference between the two is that print() outputs a string to the console, while the the return statement exits the current function (or method) and hands the returned value back to its caller. In this case, the caller could have another function, among other things. If this sounds confusing have not fear, we will further practice this!

In the console, enter this code in order to answer the question below:

x = my_data_shell.get_data()
print(x)
What value does the my_data_shell.get_data() method return?

In [1]:
class DataShell:
    def __init__(self, x):
        self.data = x

    def get_data(self):
        return self.data
        
# Declare variable with list of integers from 1 to 5 integer_list   
integer_list = [1, 2, 3, 4, 5]
        
# Instantiate DataShell taking integer_list as argument: my_data_shell
my_data_shell = DataShell(integer_list)
x = my_data_shell.get_data()
print(x)

[1, 2, 3, 4, 5]


02-Return Statement II: The Return of the DataShell

Let's now go back to the class DataShell that we were working with earlier, and refactor it such that it uses the return statement instead of the print() function.

Notice that since we are now using the return statement, we need to include our calls to object methods within the print() function.

In [2]:
# Create class: DataShell
class DataShell:
  
	# Initialize class with self and dataList as arguments
    def __init__(self, dataList):
      	# Set data as instance variable, and assign it the value of dataList
        self.data = dataList
        
	# Define method that returns data: show
    def show(self):
        return self.data
        
    # Define method that prints average of data: avg 
    def avg(self):
        # Declare avg and assign it the average of data
        avg = sum(self.data)/float(len(self.data))
        # Return avg
        return avg
        
# Instantiate DataShell taking integer_list as argument: my_data_shell
my_data_shell = DataShell(integer_list)

# Print output of your object's show method
print(my_data_shell.show())

# Print output of your object's avg method
print(my_data_shell.avg())

[1, 2, 3, 4, 5]
3.0


03 Return Statement III: A More Powerful DataShell
    
In this exercise our DataShell class will evolve from simply consuming lists to consuming CSV files so that we can do more sophisticated things.

For this, we will employ the return statement once again. Additionally, we will leverage some neat functionality from both the numpy and pandas packages.

In [3]:
# Load numpy as np and pandas as pd
import numpy as np
import pandas as pd

# Create class: DataShell
class DataShell:
  
    # Initialize class with self and inputFile
    def __init__(self, inputFile):
        self.file = inputFile
        
    # Define generate_csv method, with self argument
    def generate_csv(self):
        self.data_as_csv = pd.read_csv(self.file)
        return self.data_as_csv

# Instantiate DataShell with us_life_expectancy as input argument
data_shell = DataShell('../Data/us_life_expectancy.csv')

# Call data_shell's generate_csv method, assign it to df
df = data_shell.generate_csv()

# Print df
print(df)

           country code  year  life_expectancy
0    United States  USA  1880        39.410000
1    United States  USA  1890        45.209999
2    United States  USA  1901        49.299999
3    United States  USA  1902        50.500000
4    United States  USA  1903        50.599998
..             ...  ...   ...              ...
112  United States  USA  2011        78.681999
113  United States  USA  2012        78.820999
114  United States  USA  2013        78.959999
115  United States  USA  2014        79.099998
116  United States  USA  2015        79.244003

[117 rows x 4 columns]


04 Data as Attributes

In the previous coding exercise you wrote a method within your DataShell class that returns a Pandas Dataframe.

In this one, we will cook the data into our class, as an instance variable. This is so that we can do fancy things later, such as renaming columns, as well as getting descriptive statistics.

The object us_life_expectancy is loaded and available in your workspace.

In [4]:
# Import numpy as np, pandas as pd
import numpy as np
import pandas as pd

# Create class: DataShell
class DataShell:
  
    # Define initialization method
    def __init__(self, filepath):
        # Set filepath as instance variable  
        self.filepath = filepath
        # Set data_as_csv as instance variable
        self.data_as_csv = pd.read_csv(filepath)

# Instantiate DataShell as us_data_shell
us_data_shell = DataShell('../Data/us_life_expectancy.csv')

# Print your object's data_as_csv attribute
print(us_data_shell.data_as_csv)

           country code  year  life_expectancy
0    United States  USA  1880        39.410000
1    United States  USA  1890        45.209999
2    United States  USA  1901        49.299999
3    United States  USA  1902        50.500000
4    United States  USA  1903        50.599998
..             ...  ...   ...              ...
112  United States  USA  2011        78.681999
113  United States  USA  2012        78.820999
114  United States  USA  2013        78.959999
115  United States  USA  2014        79.099998
116  United States  USA  2015        79.244003

[117 rows x 4 columns]


05 Renaming Columns

Methods can be especially useful to manipulate their object's data. In this exercise, we will create a method inside of our DataShell class, so that we can rename our data columns.

numpy and pandas are already imported in your workspace as np and pd, respectively.

In [5]:
# Create class DataShell
class DataShell:
  
    # Define initialization method
    def __init__(self, filepath):
        self.filepath = filepath
        self.data_as_csv = pd.read_csv(filepath)
    
    # Define method rename_column, with arguments self, column_name, and new_column_name
    def rename_column(self, column_name, new_column_name):
        self.data_as_csv.columns = self.data_as_csv.columns.str.replace(column_name, new_column_name)

# Instantiate DataShell as us_data_shell with argument us_life_expectancy
us_data_shell = DataShell('../Data/us_life_expectancy.csv')

# Print the datatype of your object's data_as_csv attribute
print(us_data_shell.data_as_csv.dtypes)

# Rename your objects column 'code' to 'country_code'
us_data_shell.rename_column('code','country_code')

# Again, print the datatype of your object's data_as_csv attribute
print(us_data_shell.data_as_csv.dtypes)

country             object
code                object
year                 int64
life_expectancy    float64
dtype: object
country             object
country_code        object
year                 int64
life_expectancy    float64
dtype: object


06 Self-Describing DataShells

In this exercise you will add functionality to your DataShell class such that it returns information about itself.

numpy and pandas are already imported in your workspace as np and pd, respectively.

In [6]:
# Create class DataShell
class DataShell:

    # Define initialization method
    def __init__(self, filepath):
        self.filepath = filepath
        self.data_as_csv = pd.read_csv(filepath)

    # Define method rename_column, with arguments self, column_name, and new_column_name
    def rename_column(self, column_name, new_column_name):
        self.data_as_csv.columns = self.data_as_csv.columns.str.replace(column_name, new_column_name)
        
    # Define get_stats method, with argument self
    def get_stats(self):
        # Return a description data_as_csv
        return self.data_as_csv.describe()
    
# Instantiate DataShell as us_data_shell
us_data_shell = DataShell('../Data/us_life_expectancy.csv')

# Print the output of your objects get_stats method
print(us_data_shell.get_stats())

              year  life_expectancy
count   117.000000       117.000000
mean   1956.752137        66.556684
std      34.398252         9.551079
min    1880.000000        39.410000
25%    1928.000000        58.500000
50%    1957.000000        69.599998
75%    1986.000000        74.772003
max    2015.000000        79.244003
