- Enable logging so that all printouts from your program goes directly to a log file named *logfile.txt* which you also need to submit alongside your source code.


In [1]:
import logging
#logging config
logging.basicConfig(
					level=logging.DEBUG,
					format='%(levelname)-6s | %(asctime)s | %(message)s',
					filename=f'logfile.txt',
					filemode='w',
					)
logger = logging.getLogger()

- Read A.csv data.

In [2]:
import pandas as pd
df = pd.read_csv('deliverables/A.csv', sep=';', header = 0)
df.head()

Unnamed: 0,wavelet_transformed_variance,wavelet_transformed_skewness,wavelet_transformed_curtosis,image_entropy,counterfeit
0,-3.5985,-13.6593,17.6052,-2.4927,1
1,-2.0662,0.16967,-1.0054,-0.82975,1
2,3.9922,-4.4676,3.7304,-0.1095,0
3,4.2134,-2.806,2.0116,0.67412,0
4,4.3398,-5.3036,3.8803,-0.70432,0


In [3]:
df.shape

(1234, 5)

In [4]:
df.columns

Index(['wavelet_transformed_variance', 'wavelet_transformed_skewness',
       'wavelet_transformed_curtosis', 'image_entropy', 'counterfeit'],
      dtype='object')

- Separate dependent variable y as the counterfeit column, and rest of the variables as independent variables (as X).

In [5]:
# Separate dependent variable y as the counterfeit column
variable_y = df['counterfeit']
variable_y

0       1
1       1
2       0
3       0
4       0
       ..
1229    0
1230    0
1231    0
1232    1
1233    0
Name: counterfeit, Length: 1234, dtype: int64

In [6]:
# rest of the variables as independent variables (as X).
variable_x = df.drop(columns='counterfeit')
variable_x.shape

(1234, 4)

In [7]:
#  Split the dataset, i.e., the (X,y) into training (50%) and test (50%).
variable_y = variable_y.values.reshape(-1,1)
variable_y.shape

(1234, 1)

- Split the dataset, i.e., the (X,y) into training (50%) and test (50%).

In [8]:
nrows = len(df)
split = nrows//2
split

617

In [9]:
training_set_y = variable_y[:split]
print(training_set_y.shape)
testing_set_y = variable_y[split:]
print(testing_set_y.shape)

(617, 1)
(617, 1)


In [10]:
training_set_x = variable_x[:split]
print(training_set_x.shape)
testing_set_x = variable_x[split:]
print(testing_set_x.shape)

(617, 4)
(617, 4)


- Load the scaler object from *scaler.joblib*. It's already fit to a bunch of training samples. So, don't worry about fit it again. Instead, you may want to use the following lines to use the already fitted scaler object to transform both training and test set. And, do not scale the dependent variable/feature (i.e., y which is the target column counterfeit).


In [11]:
from sklearn.preprocessing import StandardScaler 
import joblib
scaler = StandardScaler()
scaler = joblib.load('deliverables/scaler.joblib') 
X_train_scaled = scaler.transform(training_set_x) 
X_test_scaled = scaler.transform(testing_set_x)



https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


- Build your first classifier with LogisticRegression.

In [12]:
import numpy as np
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(penalty='l1',tol=1,solver='liblinear',multi_class='auto',fit_intercept=False,max_iter=3)
