**데이터의 특성에 맞게 효율적인 모델을 선정하여 학습시키면 자원을 효율적으로 사용할 수 있고 높은 정확도를 기대해볼 수 있습니다.**

**이번 노트북에서는 sklearn의 머신러닝 모델 중 선형 모델에 대하여 다룹니다.**

**By selecting and learning an efficient model to match the characteristics of your data, you can use your resources efficiently and expect high accuracy.**

**This code will cover linear models among sklearn machine learning models.**

**선형 알고리즘은 기본적으로 회귀에 linear regression, 분류에 logistic regression 알고리즘이 있습니다.**

**이후 l1, l2 규제를 적용하며 발전시키게 되는데**

**l1 규제란 가중치의 합에 alpha를 곱하여 오차에 더해서 과적합을 방지하는 규제이고**

**l2 규제란 가중치의 제곱의 합에 alpha를 곱하여 오차에 더해서 과적합을 방지하는 규제입니다.**

**그리고 이 두 규제를 동시에 적용시킨 것이 elasticnet 규제입니다.**

**Linear algorithms basically have linear regression for regression and logistic regression algorithm for classification.**

**After that, we will develop it by applying the l1 and l2 regulations.**

**l1 Regulation is a regulation that multiplies the sum of the weights by alpha to prevent overfitting in addition to the error.**

**An l2 regulation is a regulation that multiplies the sum of the squares of the weights by alpha to add to the error and prevent overfitting.**

**Then, the elasticnet regulation applied both regulations at the same time.**

**추가로, 모든 선형 모델은 일차식을 기반으로 예측을 진행하기 때문에 기울기와 절편의 값을 알 수 있습니다.**

**Additionally, all linear models make predictions based on linear equations, so you can see the values of the slope and intercept.**

# **Import**

**데이터 분석에 기초적인 pandas와 numpy를 import하고**

**시각화를 위한 mathplotlib와 seaborn을 import합니다.**

**그리고 모델의 성능을 평가할 accuracy_score와 mean_absolute_error를 import 합니다.**

**마지막으로 경고 무시를 위한 warnings를 import합니다.**

**Import the basic pandas and numpy for data analysis.**

**Import the mat and saver for visualization.**

**Then import the accuracy_score and mean_absolute_error to evaluate the performance of the model.**

**Finally, import warnings to ignore warnings.**

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_absolute_error

import warnings
warnings.filterwarnings(action='ignore')

# **데이터 불러오기 (data load)**

**pandas의 read_csv 기능에 파일 경로를 입력하면 csv 파일을 읽을 수 있습니다**

**경로는 copy file path 기능을 이용하면 편합니다.(data 파일 오른쪽에 copy 버튼이 있습니다.)**

**You can read the csv file by entering the file path in the read_csv function of Pandas.**

**The path is convenient using the copy file path (copy button is on the right side of the data file).**

In [None]:
df = pd.read_csv("../input/titanic/train.csv")

# **데이터 살펴보기 (View data)**

In [None]:
df.info()

**결측값이 있는 것을 알 수 있습니다.**

**You can see that there is a missing value.**

In [None]:
df.head()

In [None]:
df["Age"] = df["Age"].fillna(df["Age"].mean())

**나이 column의 결측치를 채워줍시다.**

**Fill in the missing values in the age column.**

In [None]:
X = df["Age"]
y = df["Fare"]

**연속된 값을 이용하여 그래프를 그려서 확인할 예정이기 때문에 실수형 값들을 X와 y로 지정해줍니다.**

**Please specify the values of the real numbers as X and y, as we plan to draw a graph using consecutive values to confirm.**

In [None]:
sns.scatterplot(x = X, y = y, hue=df["Survived"],markers="o")

**가격의 극단적인 값에 의해 그래프가 잘 확인되지 않습니다.**

**I can't see the graph well because of the extreme value of the price.**

In [None]:
df["Fare"] = np.log(df["Fare"])
df.loc[(df["Fare"]<=0),"Fare"] = 1
y = df["Fare"]

**극단적인 값의 영향을 덜 받기 위하여 log를 적용해봅시다.**

**Let's apply the log to be less affected by extreme values.**

In [None]:
sns.scatterplot(x = X, y = y, hue=df["Survived"],markers="o")

**회귀 알고리즘은 예측값이 계속 증가할 수 있기 때문에 범주가 한정되어 있는 분류 문제를 해결하기 힘들다는 점이 있습니다.**

**LogisticRegression 알고리즘은 이 문제를 해결하기 위하여 고안된 분류 알고리즘입니다.**

**LogisticRegression 알고리즘은 확률을 0과 1의 범위 내에서 계산하여 이진 분류를 가능하게 합니다.**

**The regression algorithm is difficult to solve classification problems with limited categories because the predictions continue to increase.**

**The Logistic Regression algorithm is a classification algorithm designed to solve this problem.**

**The Logistic Regression algorithm allows binary classification by calculating probabilities within the range of 0 and 1.**

# **LogisticRegression**

In [None]:
from sklearn.linear_model import LogisticRegression

**sklearn에서 LogisticRegression 모델을 import합니다.**

**In sklearn, import the Logistic Regression model.**

In [None]:
model = LogisticRegression()
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

**기본 옵션으로 분류가 진행된 것을 알 수 있습니다.**

**You can see that the classification has progressed with the default options.**

# **-penalty**

**알고리즘에 적용할 규제를 선택합니다.**

**Select the regulations to apply to the algorithm.**

In [None]:
model = LogisticRegression(penalty="none")
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

**liblinear알고리즘은 규제를 적용하지 않을 수 없습니다.**

**The liblinear algorithm cannot help but enforce regulations.**

In [None]:
model = LogisticRegression(penalty="l1",solver="saga")
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

**l1 규제는 liblinear와 saga 알고리즘만 적용가능합니다.**

**The l1 regulation applies only to the liblinear and saga algorithms.**

In [None]:
model = LogisticRegression(penalty="l2")
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

**l2 규제는 모든 알고리즘에 적용이 가능합니다.**

**The l2 regulation is applicable to all algorithms.**

In [None]:
model = LogisticRegression(penalty="elasticnet",solver="saga",l1_ratio=0.5)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

**elasticnet 규제는 l1과 l2 규제가 동시에 적용되기에 각 규제를 적용할 비율을 정해주어야합니다.**

**elasticnet 규제는 saga 알고리즘만 적용이 가능합니다.**

**Because the Easticnet regulations apply simultaneously to the l1 and l2 regulations, you have to set the percentage to which each regulation applies.**

**The easticnet regulation applies only to the saga algorithm.**

# **-fit_intercept**

**일차식 ax+b에서 상수 부분을 사용할지 여부를 결정합니다.**

**The primary expression ax+b determines the availability of the constant portion.**

In [None]:
model = LogisticRegression(fit_intercept=True)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

**절편을 자유롭게 예측한 것을 알 수 있습니다.**

**You can see that you freely predict the intercept.**

In [None]:
model = LogisticRegression(fit_intercept=False)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

**상수부분이 없이 0,0에서 예측을 시작한 것을 알 수 있습니다.**

**You can see that the prediction started at 0.0 without a constant part.**

# **-class_weight**

**불균형 데이터에 대한 class별 가중치를 설정합니다.**

**자동으로 설정할 수도 있고 직접 설정할 수도 있습니다.**

**Configures a per-class weight for imbalance data.**

**You can configure it automatically or you can configure it yourself.**

In [None]:
model = LogisticRegression(class_weight="none")
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

In [None]:
model = LogisticRegression(class_weight="balanced")
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

# **-solver**

**알고리즘을 선택합니다.**

**작은 데이터 세트의 경우 'liblinear'가 좋은 선택인 반면, 큰 데이터 세트의 경우 'sag' 및 'saga'가 더 빠릅니다.**

**다중 클래스 문제의 경우 'newton-cg', 'sag', 'saga' 및 'lbfgs'만이 다항 손실을 처리합니다.**

**For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones**

**For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss**

**‘liblinear’ is limited to one-versus-rest schemes.**

In [None]:
model = LogisticRegression(solver="newton-cg")
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

In [None]:
model = LogisticRegression(solver="lbfgs")
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

In [None]:
model = LogisticRegression(solver="liblinear")
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

In [None]:
model = LogisticRegression(solver="sag")
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

In [None]:
model = LogisticRegression(solver="saga")
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

# **-verbose**

**정보를 표시합니다.**

**Display information.**

In [None]:
model = LogisticRegression(verbose=0)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

**verbose를 0으로 설정하면 정보가 표시되지 않습니다.**

**If you set the verbose to 0, the information is not displayed.**

In [None]:
model = LogisticRegression(verbose=1)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[0][1])
print("절편      : ",model.intercept_[0])
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[0][1])
print("intercept: ",model.intercept_[0])

**verbose를 1로 하면 학습하는 과정을 표시합니다.**

**When verbose is set to 1, it shows the learning process.**

# **linear regression**

**다음으로는 머신러닝 알고리즘의 시초라고도 할 수 있는 linear regression 알고리즘에 대하여 알아봅시다.**

**linear regression 알고리즘은 일차식을 이용하여 예측을 진행합니다. (ax+b)**

**Next, let's look at the linear regression algorithm, which is the beginning of the machine learning algorithm.**

**The linear regression algorithm uses a linear expression to advance predictions.(ax+b)**

# **데이터 불러오기 (data load)**

In [None]:
regression = pd.read_csv("../input/house-prices-advanced-regression-techniques/train.csv")
X = regression["YearBuilt"]
y = regression["SalePrice"]

**회귀 알고리즘을 더 쉽게 보기 위하여 어느정도 선형 상관관계가 있는 데이터셋을 이용해봅시다.**

**To make it easier to see the regression algorithm, let's take advantage of a dataset with some degree of linear correlation.**

In [None]:
sns.scatterplot(x = X, y = y,markers="o")

**데이터를 보면 시간이 지날수록 집값이 상승하는 것을 알 수 있습니다.**

**The data shows that housing prices rise over time.**

In [None]:
y = np.log(regression["SalePrice"])
y = y-12

**데이터를 더 일차원적으로 보기 위하여 log를 적용해줍시다.**

**Let's apply the log to see the data more one-dimensional.**

In [None]:
sns.scatterplot(x = X, y = y,markers="o")

**데이터를 보면 시간이 지날수록 집값이 상승하는 것을 알 수 있습니다.**

**The data shows that housing prices rise over time.**

In [None]:
from sklearn.linear_model import LinearRegression

**sklearn에서 LinearRegression 모델을 import합니다.**

**Import the LinearRegression model from sklearn.**

In [None]:
model = LinearRegression()
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**기본 옵션으로 분류가 진행된 것을 알 수 있습니다.**

**You can see that the classification has progressed with the default options.**

# **-fit_intercept**

**일차식 ax+b에서 상수 부분을 사용할지 여부를 결정합니다.**

**The primary expression ax+b determines the availability of the constant portion.**

In [None]:
model = LinearRegression(fit_intercept = True)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**절편을 자유롭게 예측한 것을 알 수 있습니다.**

**You can see that you freely predict the intercept.**

In [None]:
model = LinearRegression(fit_intercept = False)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**상수부분이 없이 0,0에서 예측을 시작한 것을 알 수 있습니다.**

**You can see that the prediction started at 0.0 without a constant part.**

# **-positive**

**일차식의 기울기를 항상 양수로 설정합니다.**

**Set the slope of the linear expression to always be positive.**

In [None]:
model = LinearRegression(positive = True)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**positive가 true로 설정되어 있어 예측의 기울기가 양수인 것을 알 수 있습니다.**

**The positive is set to true, indicating that the slope of the prediction is positive.**

In [None]:
positive_X = X*-1

**데이터가 우상향이기 때문에 데이터를 반전시켜봅시다.**

**Let's reverse the data because the data is upward.**

In [None]:
sns.scatterplot(x = positive_X, y = y,markers="o")

**데이터가 우하향을 이루는 것을 알 수 있습니다.**

**You can see that the data is going down to the right.**

In [None]:
model = LinearRegression(positive = True)
model.fit(positive_X.values.reshape(-1,1), y)
pred = model.predict(positive_X.values.reshape(-1,1))
plt.plot(positive_X, y, 'o')
plt.plot(positive_X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**우하향 데이터에서 positive를 true로 하면 기울기를 음수로 할 수 없어 0으로 설정된 것을 알 수 있습니다.**

**If you set positive to true in the downward right data, you can see that the slope cannot be negative and is set to 0.**

In [None]:
model = LinearRegression(positive = False)
model.fit(positive_X.values.reshape(-1,1), y)
pred = model.predict(positive_X.values.reshape(-1,1))
plt.plot(positive_X, y, 'o')
plt.plot(positive_X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**positive를 false로 하면 양수와 음수에 상관없이 기울기가 정해지는 것을 알 수 있습니다.**

**If positive is set to false, you can see that the slope is determined regardless of the positive or negative number.**

# **Lasso**

**Lasso 모델은 LinearRegression 모델에 L1 규제를 적용한 모델입니다.**

**The Lasso model is a model that applies L1 regulation to the LinearRegulation model.**

In [None]:
from sklearn.linear_model import Lasso

**sklearn에서 Lasso 모델을 import합니다.**

**Import the Lasso model from sklearn.**

In [None]:
model = Lasso()
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**기본 옵션으로 분류가 진행된 것을 알 수 있습니다.**

**You can see that the classification has progressed with the default options.**

# **-alpha**

**규제의 강도를 설정합니다.**

**Sets the strength of the regulation.**

In [None]:
model = Lasso(alpha=1)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**규제의 강도를 default 값인 1로 했을 때 예측이 진행되는 것을 알 수 있습니다.**

**If the regulatory intensity is set to the default value of 1, you can see that the prediction progresses.**

In [None]:
model = Lasso(alpha=0.1)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**규제의 강도를 변경하면 예측값도 바뀌는 것을 알 수 있습니다.**

**You can see that if you change the intensity of the regulation, the prediction changes.**

# **-fit_intercept**

**일차식 ax+b에서 상수 부분을 사용할지 여부를 결정합니다.**

**The primary expression ax+b determines the availability of the constant portion.**

In [None]:
model = Lasso(fit_intercept = True)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**절편을 자유롭게 예측한 것을 알 수 있습니다.**

**You can see that you freely predict the intercept.**

In [None]:
model = Lasso(fit_intercept = False)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**상수부분이 없이 0,0에서 예측을 시작한 것을 알 수 있습니다.**

**You can see that the prediction started at 0.0 without a constant part.**

# **-positive**

**일차식의 기울기를 항상 양수로 설정합니다.**

**Set the slope of the linear expression to always be positive.**

In [None]:
model = Lasso(positive = True)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**positive가 true로 설정되어 있어 예측의 기울기가 양수인 것을 알 수 있습니다.**

**The positive is set to true, indicating that the slope of the prediction is positive.**

In [None]:
model = Lasso(positive = True)
model.fit(positive_X.values.reshape(-1,1), y)
pred = model.predict(positive_X.values.reshape(-1,1))
plt.plot(positive_X, y, 'o')
plt.plot(positive_X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**우하향 데이터에서 positive를 true로 하면 기울기를 음수로 할 수 없어 0으로 설정된 것을 알 수 있습니다.**

**If you set positive to true in the downward right data, you can see that the slope cannot be negative and is set to 0.**

In [None]:
model = Lasso(positive = False)
model.fit(positive_X.values.reshape(-1,1), y)
pred = model.predict(positive_X.values.reshape(-1,1))
plt.plot(positive_X, y, 'o')
plt.plot(positive_X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**positive를 false로 하면 양수와 음수에 상관없이 기울기가 정해지는 것을 알 수 있습니다.**

**If positive is set to false, you can see that the slope is determined regardless of the positive or negative number.**

# **Ridge**

**Ridge 모델은 LinearRegression 모델에 L2 규제를 적용한 모델입니다.**

**Ridge model applies L2 regulation to LinearRegulation model.**

In [None]:
from sklearn.linear_model import Ridge

**sklearn에서 Ridge 모델을 import합니다.**

**Import the Ridge model from sklearn.**

In [None]:
model = Ridge()
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**기본 옵션으로 분류가 진행된 것을 알 수 있습니다.**

**You can see that the classification has progressed with the default options.**

# **-alpha**

**규제의 강도를 설정합니다.**

**Sets the strength of the regulation.**

In [None]:
model = Ridge(alpha=1)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**규제의 강도를 default 값인 1로 했을 때 예측이 진행되는 것을 알 수 있습니다.**

**If the regulatory intensity is set to the default value of 1, you can see that the prediction progresses.**

In [None]:
model = Ridge(alpha=0.1)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**규제의 강도를 변경하면 예측값도 바뀌는 것을 알 수 있습니다.**

**You can see that if you change the intensity of the regulation, the prediction changes.**

# **-fit_intercept**

**일차식 ax+b에서 상수 부분을 사용할지 여부를 결정합니다.**

**The primary expression ax+b determines the availability of the constant portion.**

In [None]:
model = Ridge(fit_intercept = True)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**절편을 자유롭게 예측한 것을 알 수 있습니다.**

**You can see that you freely predict the intercept.**

In [None]:
model = Ridge(fit_intercept = False)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**상수부분이 없이 0,0에서 예측을 시작한 것을 알 수 있습니다.**

**You can see that the prediction started at 0.0 without a constant part.**

# **-positive**

**일차식의 기울기를 항상 양수로 설정합니다.**

**Set the slope of the linear expression to always be positive.**

In [None]:
model = Ridge(positive = True)
model.fit(X.values.reshape(-1,1), y)
pred = model.predict(X.values.reshape(-1,1))
plt.plot(X, y, 'o')
plt.plot(X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**positive가 true로 설정되어 있어 예측의 기울기가 양수인 것을 알 수 있습니다.**

**The positive is set to true, indicating that the slope of the prediction is positive.**

In [None]:
model = Ridge(positive = True)
model.fit(positive_X.values.reshape(-1,1), y)
pred = model.predict(positive_X.values.reshape(-1,1))
plt.plot(positive_X, y, 'o')
plt.plot(positive_X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**우하향 데이터에서 positive를 true로 하면 기울기를 음수로 할 수 없어 0으로 설정된 것을 알 수 있습니다.**

**If you set positive to true in the downward right data, you can see that the slope cannot be negative and is set to 0.**

In [None]:
model = Ridge(positive = False)
model.fit(positive_X.values.reshape(-1,1), y)
pred = model.predict(positive_X.values.reshape(-1,1))
plt.plot(positive_X, y, 'o')
plt.plot(positive_X,pred,'o')
plt.show()

print("오차      : " ,mean_absolute_error(pred,y))
print("기울기    : ",model.coef_[0])
print("절편      : ",model.intercept_)
print()
print("error    : ",mean_absolute_error(pred,y))
print("slope    : ",model.coef_[0])
print("intercept: ",model.intercept_)

**positive를 false로 하면 양수와 음수에 상관없이 기울기가 정해지는 것을 알 수 있습니다.**

**If positive is set to false, you can see that the slope is determined regardless of the positive or negative number.**

# **회귀를 이용한 분류**

**회귀를 이용하여 class별 확률을 예측하여 확률을 기반으로 분류를 진행할 수 있습니다.**

**실제로 많은 알고리즘이 확률의 임계점을 기반으로 분류를 진행합니다.**

In [None]:
X = df["Age"]
y = df["Fare"]

In [None]:
model = LinearRegression()
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
pred = pred.reshape(-1)
sns.scatterplot(x = X, y = y, hue=pred,markers="o")

**LinearRegression을 이용하여 class에 대한 확률을 예측할 수 있습니다.**

**You can use LinearRegression to predict the probability of class.**

In [None]:
model = LinearRegression()
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
pred = pred.reshape(-1)
pred = np.where(pred>0.6,1,0)
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[1])
print("절편      : ",model.intercept_)
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[1])
print("intercept: ",model.intercept_)

**특정 임계점 이상의 값을 특정 class로 분류할 수 있습니다.**

**Values greater than or equal to a specific critical point can be classified into a specific class.**

In [None]:
model = LinearRegression(fit_intercept = False)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
pred = np.round(pred)
pred = pred.reshape(-1)
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[1])
print("절편      : ",model.intercept_)
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[1])
print("intercept: ",model.intercept_)

**위에서 본 옵션들을 설정하여 예측 값을 조정할 수 있습니다.**

**You can adjust the predicted values by setting the options seen above.**

In [None]:
model = Lasso(alpha=1)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
pred = np.round(pred)
pred = pred.reshape(-1)
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[1])
print("절편      : ",model.intercept_)
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[1])
print("intercept: ",model.intercept_)

**Lasso 모델로 확률을 분류하려 했지만 규제의 강도가 너무 강해서 예측을 올바르게 수행하지 못하였습니다.**

**I tried to classify the probability with the Lasso model, but the regulatory intensity was too strong to perform the prediction correctly.**

In [None]:
model = Lasso(alpha=0.1)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
pred = np.round(pred)
pred = pred.reshape(-1)
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[1])
print("절편      : ",model.intercept_)
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[1])
print("intercept: ",model.intercept_)

**규제를 약화시킴에 따라 예측이 진행된 것을 알 수 있습니다.**

**It can be seen that the prediction has progressed as regulations have been weakened.**

In [None]:
model = Lasso(alpha=0.01)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
pred = np.round(pred)
pred = pred.reshape(-1)
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[1])
print("절편      : ",model.intercept_)
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[1])
print("intercept: ",model.intercept_)

**규제를 더 완화시키면 정확도에 영향을 미칩니다.**

**Further deregulation will affect accuracy.**

In [None]:
model = Lasso(alpha=0.001)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
pred = np.round(pred)
pred = pred.reshape(-1)
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[1])
print("절편      : ",model.intercept_)
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[1])
print("intercept: ",model.intercept_)

**규제의 강도를 변화시키며 적절한 값을 찾는 것을 추천드립니다.**

**It is recommended to change the intensity of the regulation and find appropriate values.**

In [None]:
model = Ridge(alpha=0.01)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
pred = np.round(pred)
pred = pred.reshape(-1)
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[1])
print("절편      : ",model.intercept_)
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[1])
print("intercept: ",model.intercept_)

**L2 규제를 적용하여도 확률을 예측하여 분류를 진행할 수 있습니다.**

**Even if the L2 regulation is applied, the probability can be predicted and classified.**

In [None]:
model = Ridge(alpha=0.001)
model.fit(df[["Age","Fare"]], df["Survived"])
pred = model.predict(df[["Age","Fare"]])
pred = np.round(pred)
pred = pred.reshape(-1)
sns.scatterplot(x = X, y = y, hue=pred,markers="o")
plt.show()

print("정확도    : " ,accuracy_score(pred,df["Survived"]))
print("기울기    : ",model.coef_[1])
print("절편      : ",model.intercept_)
print()
print("accuracy : ",accuracy_score(pred,df["Survived"]))
print("slope    : ",model.coef_[1])
print("intercept: ",model.intercept_)

**규제의 강도를 변화시키며 적절한 값을 찾는 것을 추천드립니다.**

**It is recommended to change the intensity of the regulation and find appropriate values.**

**선형 모델의 장점은 직관적인 예측이 가능하다는 것입니다.**

**그러나 직관적인만큼 다양한 경우에 대응하지 못하는 경우가 있습니다.**

**다음에는 다른 알고리즘을 이용한 머신러닝 모델을 소개하겠습니다.**

**The advantage of linear models is that they can make intuitive predictions.**

**However, it may not be able to cope with as many cases as it is intuitive.**

**Next, I will introduce a machine learning model using other algorithms.**