# Task 1
## What is the Semi-supervised learning?
### Semi-supervised :
branch of machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training.
### Examples of Semi-Supervised Learning

    Speech Analysis
    Anomaly detection
    Text classification
    Image classification
    Internet Content Classification
    Protein Sequence Classification
    
### Disadvantages of Semi-Supervised Learning

The most basic disadvantage of any Supervised Learning algorithm is that the dataset has to be hand-labeled either by a Machine Learning Engineer or a Data Scientist. This is a very costly process, especially when dealing with large volumes of data. The most basic disadvantage of any Unsupervised Learning is that its application spectrum is limited. 

To counter these disadvantages, the concept of Semi-Supervised Learning was introduced. In this type of learning, the algorithm is trained upon a combination of labeled and unlabelled data. Typically, this combination will contain a very small amount of labeled data and a very large amount of unlabelled data. The basic procedure involved is that first, the programmer will cluster similar data using an unsupervised learning algorithm and then use the existing labeled data to label the rest of the unlabelled data. The typical use cases of such type of algorithm have a common property among them – The acquisition of unlabelled data is relatively cheap while labeling the said data is very expensive. 

# Task 2
## Exercise
<p >Predict canada's per capita income in year 2020. There is an exercise folder here on github at same level as this notebook, download that and you will find canada_per_capita_income.csv file. Using this build a regression model and predict the per capita income fo canadian citizens in year 2020</p>

### Answer

In [1]:
import pandas as pd 
import numpy as np
from sklearn.linear_model import LinearRegression

In [2]:
df = pd.read_csv("canada_per_capita_income.csv")

In [3]:
df

Unnamed: 0,year,per capita income (US$)
0,1970,3399.299037
1,1971,3768.297935
2,1972,4251.175484
3,1973,4804.463248
4,1974,5576.514583
5,1975,5998.144346
6,1976,7062.131392
7,1977,7100.12617
8,1978,7247.967035
9,1979,7602.912681


In [4]:
x = df.drop(columns= "per capita income (US$)")
y = df["per capita income (US$)"]

In [5]:
model = LinearRegression()
model.fit(x,y)
model.score(df.drop(columns="per capita income (US$)"), df["per capita income (US$)"])

0.890916917957032

In [6]:
model.predict([[2020]])



array([41288.69409442])

# Task 3
## Exercise
At the same level as this notebook on github, there is an Exercise folder that contains carprices.csv.
This file has car sell prices for 3 different models. First plot data points on a scatter plot chart
to see if linear regression model can be applied. If yes, then build a model that can answer
following questions,

**1) Predict price of a mercedez benz that is 4 yr old with mileage 45000**

**2) Predict price of a BMW X5 that is 7 yr old with mileage 86000**

**3) Tell me the score (accuracy) of your model. (Hint: use LinearRegression().score())**

### Answer

In [7]:
df = pd.read_csv("carprices.csv")
df

Unnamed: 0,Car Model,Mileage,Sell Price($),Age(yrs)
0,BMW X5,69000,18000,6
1,BMW X5,35000,34000,3
2,BMW X5,57000,26100,5
3,BMW X5,22500,40000,2
4,BMW X5,46000,31500,4
5,Audi A5,59000,29400,5
6,Audi A5,52000,32000,5
7,Audi A5,72000,19300,6
8,Audi A5,91000,12000,8
9,Mercedez Benz C class,67000,22000,6


In [8]:
dummies = pd.get_dummies(df['Car Model'])
dummies

Unnamed: 0,Audi A5,BMW X5,Mercedez Benz C class
0,False,True,False
1,False,True,False
2,False,True,False
3,False,True,False
4,False,True,False
5,True,False,False
6,True,False,False
7,True,False,False
8,True,False,False
9,False,False,True


In [9]:
merged = pd.concat([df,dummies],axis='columns')
merged

Unnamed: 0,Car Model,Mileage,Sell Price($),Age(yrs),Audi A5,BMW X5,Mercedez Benz C class
0,BMW X5,69000,18000,6,False,True,False
1,BMW X5,35000,34000,3,False,True,False
2,BMW X5,57000,26100,5,False,True,False
3,BMW X5,22500,40000,2,False,True,False
4,BMW X5,46000,31500,4,False,True,False
5,Audi A5,59000,29400,5,True,False,False
6,Audi A5,52000,32000,5,True,False,False
7,Audi A5,72000,19300,6,True,False,False
8,Audi A5,91000,12000,8,True,False,False
9,Mercedez Benz C class,67000,22000,6,False,False,True


In [10]:
final = merged.drop(["Car Model","Mercedez Benz C class"],axis='columns')
final

Unnamed: 0,Mileage,Sell Price($),Age(yrs),Audi A5,BMW X5
0,69000,18000,6,False,True
1,35000,34000,3,False,True
2,57000,26100,5,False,True
3,22500,40000,2,False,True
4,46000,31500,4,False,True
5,59000,29400,5,True,False
6,52000,32000,5,True,False
7,72000,19300,6,True,False
8,91000,12000,8,True,False
9,67000,22000,6,False,False


In [11]:
X = final.drop('Sell Price($)',axis='columns')
y = final['Sell Price($)']

In [12]:
model = LinearRegression()
model.fit(X,y)
model.score(X,y)

0.9417050937281082

**1) Predict price of a mercedez benz that is 4 yr old with mileage 45000**

In [13]:
model.predict([[45000,4,0,0]])



array([36991.31721061])

**2) Predict price of a BMW X5 that is 7 yr old with mileage 86000**

In [14]:
model.predict([[86000,7,0,1]])



array([11080.74313219])

**3) Tell me the score (accuracy) of your model. (Hint: use LinearRegression().score())**

In [15]:
model.score(X,y)

0.9417050937281082