# Hands On ML with Scikit Learn and TF CH 2: End-to-End ML Project

Currently learning all the necesary steps to take when taking on an ML Project.

A Quick summary is shown below:

1. First thing first, frame the problem and look at the bigger picture,
2. Get Data
3. Explore the data
4. Prepare the data
5. Short-List Promising models.
6. Fine tune the system.
7. Present the solution.
8. Launch the solution.

The book does go in more detail for each step, but I chose to summarise the main 8 steps, and I will be using them as a template for my projects

## 1. Frame the problem and look at the bigger picture

The problem I need to solve here is to be able to determine the median price of a house in a certain district. This price will be used to determine whether investors should invest in housing in certain districts. Currently, prices are maunally calculated using experts, this is time consuming and expensive for the company, so I need to develop a model that predict the price of a house based on its district.

This a supervised learning problem, as I will be using the Califorinia Census Data to train the model. Further this is a regression type problem as the model has to predict a certain value. The performance measures to be used can be RMSE (Root Mean Square Error) or the MAE(Mean Absolute Error). We will have to look at the data to determine which one. 

There is human expertise available, as the preictions can be checked with experts to determine the accuracy of the model predictions within a certain accuracy margin.

## 2. Get Data
I will download the data and load all the necessary libraries needed to perform the task at hand

In [None]:
# Load Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

In [None]:
#Load Data
# VERY NB: It is preferable to always create a function to get data online, that we have the most updated version of the data

housing_data = pd.read_csv("housing.csv")

## 3. Explore Data (Exploratory DataAnalysis)

In [None]:
housing_data.head()

In [None]:
housing_data.info()

I chose to download the data rather, will learn to fetch data online, when doing another project. However, I have loaded the data and this is what I see:

The data has 10 columns and 20640 entries(rows). All the columns are numbers expect for the ocean_proximity column. 

The target values for this project will be the median_house_value column. and the rest are features features I will use.

In [None]:
housing_data.describe()

Lets plot the numerical columns on a histogram to see how they are distributed

In [None]:
housing_data.hist(bins = 20, figsize = (20, 15))
plt.show()

In [None]:
# Lets the Median Income into categories.
# We can see the the income for the districts are capped between 0 and 15. So in real life we might have to further research as to
# How this value was calculated, but for now, will just divide the incomes into classes

housing_data["income_cat"] = np.ceil(housing_data['median_income']/1.5) # Divide income by 1.5 and then round up to have a fixed number
housing_data["income_cat"].where(housing_data["income_cat"]<5, 5.0, inplace = True)

In [None]:
# now lets plot these and see what we have:

housing_data["income_cat"].hist(bins = 20)
plt.title("Median Income Categories")
plt.show()

In [None]:
import seaborn as sns

sns.countplot(x = housing_data["ocean_proximity"], data = housing_data)

In [None]:
housing_data["ocean_proximity"].value_counts()

## Train Test Split

So I have to split the data now into the training and test sets. However, special attention needs to be considered when doing this.

Firstly, this is done to avoid Data Snooping Bias, which might occur if the overall data set is visualised now.

Secondly, the test set must be split from the overall data, in such as a way, that the test set has the same distribution as the overall data in terms of the colummns and more especially for categorical features(columns).

There are two methods we can use to split the data:
1. Train Test Split class from Sklearn
2. StratifiedShuffleSplit class from Sklearn.

I have used the train test split more frequently, I will try the Stratified Shuffle Sampling split now

In [None]:
# Now lets use a Stratified Split based on the Income Categories
from sklearn.model_selection import StratifiedShuffleSplit
split_2 = StratifiedShuffleSplit(n_splits = 1, test_size = 0.2, random_state = 42)
for train_index, test_index in split_2.split(housing_data, housing_data["income_cat"]):
    strat_train_set = housing_data.loc[train_index]
    strat_test_set = housing_data.loc[test_index]
print(len(strat_train_set), len(strat_test_set))

In [None]:
strat_train_set

In [None]:
strat_test_set

In [None]:
from sklearn.model_selection import train_test_split

training_data, test_data = train_test_split(housing_data, test_size = 0.2, random_state =42)
print(len(training_data), len(test_data))

In [None]:
training_data

In [None]:
test_data

In [None]:
## I will drop the Income Categories column from my training test(Strat Split)

strat_train_set = strat_train_set.drop("income_cat", axis = 1)

In [None]:
strat_train_set

## Data Visualization

In [None]:
strat_test_set = strat_test_set.drop("income_cat", axis = 1)

So lets plot the training set to see what patterns are visible in the data, also we want to see the correlations between the attributes and the target attribute

In [None]:
housing = strat_train_set.copy()

In [None]:
housing.plot(kind = "scatter", x = 'longitude', y = 'latitude')
plt.show()

This plot gives a very basic render of the California region, and does not give any valuable information. So the parameters need to be added to improve it.

In [None]:
housing.plot(kind = "scatter", x = 'longitude', y = 'latitude', alpha = 0.1) # Check the density of the different districts

In [None]:
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
    s=housing["population"]/100, label="population", figsize=(10,7),
    c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
    sharex=False)
plt.legend()


Now the we have a nice plot, and shows the population density, and the price of the houses in different districts. We have even takle it further and show an image of the California State and plot the plot the data on top of it

In [None]:
import matplotlib.image as mpimg
california_img=mpimg.imread('california.png')
ax = housing.plot(kind="scatter", x="longitude", y="latitude", figsize=(10,7),
                       s=housing['population']/100, label="Population",
                       c="median_house_value", cmap=plt.get_cmap("jet"),
                       colorbar=False, alpha=0.4,
                      )
plt.imshow(california_img, extent=[-124.55, -113.80, 32.45, 42.05], alpha=0.5,
           cmap=plt.get_cmap("jet"))
plt.ylabel("Latitude", fontsize=14)
plt.xlabel("Longitude", fontsize=14)

prices = housing["median_house_value"]
tick_values = np.linspace(prices.min(), prices.max(), 11)
cbar = plt.colorbar()
cbar.ax.set_yticklabels(["$%dk"%(round(v/1000)) for v in tick_values], fontsize=14)
cbar.set_label('Median House Value', fontsize=16)

plt.legend(fontsize=16)
plt.show()

### Data Correlation

Now I wanna see the correalations between the target attribute and the other attributes. There are two ways to do this:

1. Get the Correlation Coefficient to see for linear correlations
2. Plot a scatter matrix(pandas function), or pairplot(seaborn function)

In [None]:
corr_matrix = housing.corr() # Get the correlation matrix
corr_matrix["median_house_value"].sort_values(ascending = False) # See how the different attributes linearly correlate with the target attribute

The median income has a positive correlation with the median house value. The r value is close to 1

In [None]:
# Plot the scatter matrix using Pandas

attributes = ["median_house_value", "median_income", "total_rooms",
              "housing_median_age"] # All these attributes have a positive correlation. 

from pandas.plotting import scatter_matrix

scatter_matrix(housing[attributes], figsize = (15,10))

In [None]:
# I want to further investigate the relationship between the median income and median House value
housing.plot(kind = "scatter", x = "median_income", y = "median_house_value", color = 'red' ,figsize = (10,7), alpha = 0.1)
plt.title("Median Income vs Median House Price")
plt.show()

### Experimenting with Attribute Combinations

Now I will combine some attributes to see if the give some insights to this data. 

For example, the total number of rooms in a district is not very
useful if you don’t know how many households there are. What you really want is the number of rooms
per household. Similarly, the total number of bedrooms by itself is not very useful: you probably want to
compare it to the number of rooms. And the population per household also seems like an interesting
attribute combination to look at.

So I will combine atrributes to get the:
1. Number of rooms per household
2. Number of bedrooms per number of rooms
3. Population per Household

Thereafter I want to see how these new attributes, correlate to the target attribute(house_value)

#### NOTE: Exploratory Data Analysis is a iterative process. I might have to come back and explore gain with different combinations, depending on the type of project I am doing.

In [None]:
housing["rooms_per_household"] = housing["total_rooms"]/housing["households"]
housing["bedrooms_per_room"] = housing["total_bedrooms"]/housing["total_rooms"]
housing["population_per_household"]=housing["population"]/housing["households"]

In [None]:
corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending = False)

## 4. Data Cleaning and Preparation

Now I will need to prepare the data for putting it in a ML Algo. 
There are a few things to note though:

1. Firstly, I need to deal with missing data. I need to impute the missing values with the median value.
2. Secondly,I need to deal with text or categorical data. For this, I will need to encode the data from categories to numbers, since ML algos work with numbers.

So lets do that!!!

In [None]:
# Lets find out which columns have missing values

strat_train_set.info()

In [None]:
# Impute the missing values with the median values for all attributes.

housing_data = strat_train_set.drop("median_house_value", axis = 1) # Need to drop the labels and work with just the features
housing_data_labels = strat_train_set["median_house_value"] # Get the labels
housing_data_num = housing_data.drop("ocean_proximity", axis = 1)# The imputer works with numerical values, so we will have drop the OceanProximity Column

from sklearn.preprocessing import Imputer

imputer = Imputer(strategy = 'median')

X = imputer.fit_transform(housing_data_num) # Fit the data and then transform it to impute the missing values using median 

In [None]:
housing_data_tr = pd.DataFrame(X, columns = housing_data_num.columns)
housing_data_tr

In [None]:
housing_data_tr.info()

### Data Transformations

Create a pipeline to automate transformations to data
Transformations:

1. Impute Missing Data Points with median values
2. Do Feature Scaling: Standardisation
3. Encode categorical data

In [None]:
from sklearn.pipeline import Pipeline

feature_matrix = strat_train_set.drop("median_house_value", axis = 1) # Drop the labels column
label_vector = strat_train_set["median_house_value"] # Get our labels vector

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder

num_features = feature_matrix.drop("ocean_proximity", axis = 1) # Get the numerical features
#cat_features = feature_matrix["ocean_proximity"] # Get the categorical features 

# Get a pipeline to perform transformations on the numerical features
num_pipeline =Pipeline([
    ("imputer",  SimpleImputer(strategy = "median")), # impute missing values with median
    ("scaler", StandardScaler()) # Scale the data using Standardisation
])

# Use ColumnTransformer to encode the categorical data
num_attr = list(num_features) # Put the column names in num_features in a list
cat_attr = ["ocean_proximity"] # Get the name of the column with categorical features in a list

full_pipeline = ColumnTransformer([
    ("num", num_pipeline, num_attr),
    ("cat", OneHotEncoder(), cat_attr)
])

feature_matrix = full_pipeline.fit_transform(feature_matrix)

In [None]:
feature_matrix.shape

In [None]:
num_attr =list(num_features)
print(num_attr)

In [None]:
cat_attr = ["ocean_proximity"]
print(cat_attr)

In [None]:
feature_matrix

In [None]:
label_vector

## 5. ShortList Some Promising Models

Pick some models, and train them on the data and evaluate them. Usually pick between 2 and 5 models

Models to work with:

1. Linear Regression model
2. Decision Tree model
3. RandomForest Model
4. XGBoost

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score

def train_eval(training_features, training_labels):
    
    models = [LinearRegression(), DecisionTreeRegressor(), RandomForestRegressor(), XGBRegressor()]
    
    for i in models:
        #train models and conduct Cross Validation
        i.fit(training_features, training_labels)
        scores = cross_val_score(i, training_features, training_labels, cv = 5) # Conduct Cross validation on all the models
        print("Scores:", scores*100)
        print("Standard deviation:", (scores.std())*100)
        


In [None]:
# Lets Train and evaluate the models now

train_eval(feature_matrix, label_vector)

The cross_val_score using the neg_mean_squared_error, didnt work for some reason. The output of my training and evaulation function was NaNs. 

As a counter measure, I removed the scoring parameter. The result was a percentage score, for each fold.I then got the standard deviation. 

###### Model Performance:

The Random Forest Regressor performed the best.

In [None]:
# This is what I am talking about
lin_reg = LinearRegression()
lin_reg_model = lin_reg.fit(feature_matrix, label_vector)
lin_reg_scores = cross_val_score(lin_reg_model, feature_matrix, label_vector, scoring = "neg_mean_squared_error", cv = 5)
lin_rmse = np.sqrt(lin_reg_scores)
print("Scores:", lin_rmse)
print("Average:", lin_rmse.mean())
print("Standard deviation:", lin_rmse.std())


## 6. Fine Tune the model

Now that the model is trained. The best model can be fine tuned by choosing the best hyperparamters, that lead us to the best score

There are two ways to do this:

1. GridSearch Cross Validation

2. RandomisedSearch.

One basically searches for the best hyperparamters, that will gives the best scores.

As previously stated, the RandomForest Regressor performed the best, so it will be fine tuned.

#### GridSearch Cross Validation

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor
# Lets the hyperparameter values, and put them in a list of dictionaries

param_grid = [
    {'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},
    {'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]

randomforest_regressor = RandomForestRegressor()

grid_search = GridSearchCV(randomforest_regressor, param_grid, cv = 5, scoring = "neg_mean_squared_error", return_train_score = True)

grid_search.fit(feature_matrix, label_vector)

In [None]:
grid_search.best_params_

#### Randomized Search Cross Validation

In [None]:
from sklearn.model_selection import RandomizedSearchCV

param_grid =  {'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]}
   
    #{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]}



rand_search = RandomizedSearchCV(randomforest_regressor, param_grid, cv = 5, scoring = "neg_mean_squared_error", return_train_score = True)
rand_search.fit(feature_matrix, label_vector)
rand_search.get_params


In [None]:
rand_search.best_params_

In [None]:
## Lets make some predictions on the test set

test_features = strat_test_set.drop("median_house_value", axis = 1)
test_labels = strat_test_set["median_house_value"]

In [None]:
#Transform the test set using our transformation pipeline
test_features = full_pipeline.transform(test_features)

In [None]:
randomforest_regressor = RandomForestRegressor(n_estimators = 30, max_features = 8)

model = randomforest_regressor.fit(feature_matrix, label_vector)
predictions = model.predict(test_features)

print("True Target:", test_labels, "Predicted Target:", predictions)


In [None]:
from sklearn.metrics import mean_squared_error
accuracy = mean_squared_error(test_labels, predictions)
rmse = np.sqrt(accuracy)
print("Accuracy Score:", rmse)

From here, one can further improve the model, pick a different model etc, until a satisfactory result is achieved. 

# Book Exercises

### Question 1

Try a Support Vector Machine regressor (sklearn.svm.SVR) with
various hyperparameters, such as kernel="linear" (with various
values for the C hyperparameter) or kernel="rbf" (with various
values for the C and gamma hyperparameters). Don’t worry about what
these hyperparameters mean for now. How does the best SVR predictor
perform?

In [None]:
from sklearn.svm import SVR

# Linear Kernel and various c values(Will use GridSearchCV)

# Step 1: Find the best kernel and C values. 

# Get the hyperparameters we want to check: The kernel, the C value and the gamma value(Find out what C and gamma do)
param_grid = [
        {'kernel': ['linear'], 'C': [10., 30., 100., 300., 1000., 3000., 10000., 30000.0]},
        {'kernel': ['rbf'], 'C': [1.0, 3.0, 10., 30., 100., 300., 1000.0],
         'gamma': [0.01, 0.03, 0.1, 0.3, 1.0, 3.0]},
    ]

# Set up the svr object:

svm_regressor = SVR()

grid_search_svr = GridSearchCV(svm_regressor, param_grid, cv=5, scoring='neg_mean_squared_error', verbose=2, n_jobs=-1)

grid_search_svr.fit(feature_matrix, label_vector)
grid_search_svr.best_params_

In [None]:
negative_mse = grid_search_svr.best_score_
rmse = np.sqrt(-negative_mse)
rmse

In [None]:
# Lets test how the svr does on the test set:

svm_regressor = SVR()

model_svr = svm_regressor.fit(feature_matrix, label_vector)
predictions_svm = model.predict(test_features)
score_svm = mean_squared_error(test_labels, predictions_svm)
print("SVM Regressor Error:", score_svm)

### Question 2

Try replacing GridSearchCV with RandomizedSearchCV.

In [None]:
# Changing the hyperparamter values, because it is REALLY is computationally expensive
from scipy.stats import expon, reciprocal
param_distribs = {
        'kernel': ['linear', 'rbf'],
        'C': reciprocal(20, 200000),
        'gamma': expon(scale=1.0),
    }

randsvm_search = RandomizedSearchCV(svm_regressor, param_distributions=param_distribs,
                                n_iter=50, cv=5, scoring='neg_mean_squared_error',
                                verbose=2, n_jobs=-1, random_state=42)
randsvm_search.fit(feature_matrix, label_vector)
randsvm_search.best_params_

In [None]:
negative_mse_2 = randsvm_search.best_score_
rmse_2 = np.sqrt(-negative_mse_2)
rmse_2

So for the Random Search, we use distributions of the hyperparamter values. We can use different distributions.


In [None]:
from scipy.stats import geom, expon
geom_distrib=geom(0.5).rvs(10000, random_state=42)
expon_distrib=expon(scale=1).rvs(10000, random_state=42)
plt.hist(geom_distrib, bins=50)
plt.show()
plt.hist(expon_distrib, bins=50)
plt.show()

For the Gamma Parameter, we used the exponential distribution, with a scale of 1.0, using 10000 samples.

In [None]:
expon_distrib = expon(scale=1.)
samples = expon_distrib.rvs(10000, random_state=42)
plt.figure(figsize=(10, 4))
plt.subplot(121)
plt.title("Exponential distribution (scale=1.0)")
plt.hist(samples, bins=50)
plt.subplot(122)
plt.title("Log of this distribution")
plt.hist(np.log(samples), bins=50)
plt.show()

For the C hyperparameter, we use the reciprocal distribution, for ranges of C between 20 and 20000, using 10000 samples.

In [None]:
reciprocal_distrib = reciprocal(20, 200000)
samples = reciprocal_distrib.rvs(10000, random_state=42)
plt.figure(figsize=(10, 4))
plt.subplot(121)
plt.title("Reciprocal distribution (scale=1.0)")
plt.hist(samples, bins=50)
plt.subplot(122)
plt.title("Log of this distribution")
plt.hist(np.log(samples), bins=50)
plt.show()

The reciprocal distribution is useful when you have no idea what the scale of the hyperparameter should be (indeed, as you can see on the figure on the right, all scales are equally likely, within the given range), whereas the exponential distribution is best when you know (more or less) what the scale of the hyperparameter should be.

### NEED TO REVIEW THE DOCS FOR THESE DISTRIBUTIONS BRO!!!

see https://docs.scipy.org/doc/scipy/reference/stats.html

### Question 3

Try adding a transformer in the preparation pipeline to select only the
most important attributes.

In [None]:
# Now add the ability to determine the most important features



In [None]:
class Person():
    def __init__(self, name, surname):
        self.name = name
        self.surname = surname

In [None]:
hunter = Person("Mbasa", "Cokile")

In [None]:
print(hunter.name)

In [None]:
print(hunter.name, hunter.surname)

In [None]:
class Car(): # Parent Class
    def exclaim(self):
        print("I am a Car!")

class Yugo(Car):# Child Class
    pass

In [None]:
give_me_yugo = Yugo()
give_me_yugo.exclaim()

In [None]:
give_me_car = Car().exclaim()

In [None]:
class Person():
    def __init__(self, name):
        self.name = name
    def exclaim(self):
        print("HAHAHA, really!!")
class MDPerson(Person):
    def __init__(self, name, position):
        super().__init__(name)
        self.position  = position 
class JDPerson(Person):
    def __init__(self, name, position):
        super().__init__(name)
        self.position = position

# We have changed the initialisation method(), in the child classes. Lets test it mate

In [None]:
# Create Objects of each class
someone = Person("Mike Oxsmall")

doctor = MDPerson("Mike Oxsmall", "Doctor")

lawyer = JDPerson("Mike Oxsmall", "Laywer")

In [None]:
# Print the name attributes of each object

print(someone.name)
print(doctor.position, doctor.name)
print(lawyer.name, lawyer.position)

In [None]:
class Person_1():
    def __init__(self, name):
        self.name = name
class EmailPerson(Person_1):
    def __init__(self, name, email):
        super().__init__(name)
        self.email = email

In [None]:
details = EmailPerson("Mbasa Cokile", "mbasacokile7@yahoo.com")
print(details.name, details.email)


In [None]:
person_1 = Person("Mike Oxsmall")
Person.exclaim(person_1)

In [None]:
person = Person("Mike Oxsmall")
Person.exclaim(person)

In [None]:
person = Person("Cory Chatsworth")

In [None]:
person.name = "Mike Oxmall"

In [None]:
print(person.name)

#### 3 Types of methods:
1. Object Method (Always has the `self` argument in the function def)
2. Class Method, has a preceding decorator(`@classmethod`), and the initial argument is the class itself, using the keyword: `cls`
3. Static Method, no arguments and also has a preceding decorator (`@staticmethod`)

In [None]:
class A():
    count = 0
    def __init__(self): # Object Method
        A.count +=1 # Class Attribute
    def exclaim(self): # Object Method
        print("I'm an A bruv, BOOM!")
    @classmethod
    def kids(cls): # Class Method
        print("A has ", cls.count, "Little Objects") #Prints out how many objects the class has
        

In [None]:
# Create some objects
breezy_A = A()
Eazy_A = A()
Skilly_A = A()
Filly_A = A()
Freakin_A = A()

# use the class method now:

A.kids()

## Introduction to Python Exercises(Chapter 6: OOP)

#### Question 1

Make a class called Thing with no contents and print it. Then, create an object called
example from this class and also print it. Are the printed values the same or different?

In [1]:
class Thing():
    pass

print(Thing())

<__main__.Thing object at 0x0000020C8B72FE88>


In [2]:
example = Thing()
print(example)

<__main__.Thing object at 0x0000020C8B73E748>


Yes, the printed the values are the same.

#### Question 2:
    
Make a new class called Thing2 and assign the value 'abc' to a class attribute called
letters. Print letters

In [9]:
class Thing2():
    letters = "abc"

print(Thing2.letters)

abc


#### Question 3

Make yet another class called, of course, Thing3. This time, assign the value 'xyz'
to an instance (object) attribute called letters. Print letters. Do you need to make
an object from the class to do this?

In [11]:
class Thing3():
    def __init__(self):
        self.letters = "xyz" # Object Attribute



something = Thing3()
print(something.letters)
# Awe, so the attribute is for an object created from the class, so you have to make an object to print the letters

xyz


#### Question 4

Make a class called Element, with instance attributes name, symbol, and number.
Create an object of this class with the values 'Hydrogen', 'H', and 1

In [12]:
class Element():
    def __init__(self, name, symbol, number):
        self.name = name
        self.symbol = symbol
        self.number = number
        
element = Element("Hydrogen", "H", 1)

#### Question 5

Make a dictionary with these keys and values: 'name': 'Hydrogen', 'symbol':
'H', 'number': 1. Then, create an object called hydrogen from class Element using
this dictionary.

In [14]:
element_dict = {'name': "Hydrogen", 'symbol': "H", "number": 1}

hydrogen = Element(**element_dict)

In [15]:
hydrogen.name

'Hydrogen'

### Question 6

For the Element class, define a method called dump() that prints the values of the
object’s attributes (name, symbol, and number). Create the hydrogen object from this new
definition and use dump() to print its attributes

In [20]:
class Element():
    def __init__(self, name, symbol, number):
        self.name = name
        self.symbol = symbol
        self.number = number
        
    def dump(self):
        print("Name: ",self.name,", Symbol:",self.symbol,", Number: ",self.number)
        
hydrogen = Element("Hydrogen", "H", 1)
hydrogen.dump()

Name:  Hydrogen , Symbol: H , Number:  1


In [None]:
# The author did it like this:

def dump(self):
    print('name=%s, symbol=%s, number=%s' % (self.name, self.symbol, self.number))
    
# Basically, does the same thing, as long we got to print the attributes

#### Question 7

Call print(hydrogen). In the definition of Element, change the name of method
dump to __str__, create a new hydrogen object, and call print(hydrogen) again

In [21]:
print(hydrogen)

<__main__.Element object at 0x0000020C8B7E7088>


In [28]:
class Element():
    def __init__(self, name, symbol, number):
        self.name = name
        self.symbol = symbol
        self.number = number
        
    def __str__(self):
        return('name=%s, symbol=%s, number=%s' % 
              (self.name, self.symbol, self.number))

In [29]:
hydrogen = Element("Hydrogen", "H", 1)
print(hydrogen)

name=Hydrogen, symbol=H, number=1


#### Question 8

Modify Element to make the attributes name, symbol, and number private. Define a
getter property for each to return its value.

In [30]:
class Element():
    def __init__(self, name, symbol, number):
        self.__name = name
        self.__symbol = symbol
        self.__number = number
    
    @property
    def name(self):
        return self.__name
    
    @property
    def symbol(self):
        return self.__symbol
    
    @property
    def number(self):
        return self.__number

In [32]:
hydrogen = Element(**element_dict)

hydrogen.number

1

In [33]:
hydrogen.name


'Hydrogen'

In [34]:
hydrogen.symbol

'H'

#### Question 9

Define three classes: Bear, Rabbit, and Octothorpe. For each, define only one
method: eats(). This should return 'berries' (Bear), 'clover' (Rabbit), or
'campers' (Octothorpe). Create one object from each and print what it eats.

In [37]:
class Bear():
    def eats(self):
        return "Bears eat berries"

class Rabbit():
    def eats(self):
        return "Rabbits eat clovers"

class Octothorpe():
    def eats(self):
        return "Octothorpes eat campers"

In [36]:
bear = Bear()
print(bear.eats())

Bear eat berries


In [38]:
rabbit = Rabbit()
print(rabbit.eats())

Rabbits eat clovers


In [39]:
octopus = Octothorpe()
print(octopus.eats())

Octothorpes eat campers


#### Question 10

Define these classes: Laser, Claw, and SmartPhone. Each has only one method:
does(). This returns 'disintegrate' (Laser), 'crush' (Claw), or 'ring' (Smart
Phone). Then, define the class Robot that has one instance (object) of each of these.
Define a does() method for the Robot that prints what its component objects do.

In [64]:
class Laser():
    def does(self):
        return "Disintegrate"
    
class Claw ():
    def does(self):
        return "Crush"
    
class Smartphone():
    def does(self):
        return "Ring"
    


In [74]:
class Robot:
    def __init__(self):
        self.laser = Laser()
        self.claw = Claw()
        self.smartphone = Smartphone()
        
    def does(self):
        return '''I have many attachmen My laser, to %s. My claw, to %s. My smartphone, to %s.''' % (self.laser.does(), self.claw.does(),self.smartphone.does() )
    

In [75]:
robot = Robot()

robot.does()

'I have many attachmen My laser, to Disintegrate. My claw, to Crush. My smartphone, to Ring.'