<a href="https://colab.research.google.com/github/mellow-steps/S576J-Graduate-Certificate-Of-Data-Analytics/blob/main/PassTask3_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Collection of scikit-learn commands for machine learning



###  Importing Modules, Classes and Functions:
 In scikit-learn (sklearn), you can import modules, classes, and functions to utilize its wide range of machine learning algorithms, utilities, and tools. This modular structure allows you to import specific components as needed for machine learning tasks.

For example, you can import the **svm** module, which contains implementations of Support Vector Machine (SVM) algorithms. These are powerful supervised learning models used for classification and regression tasks.

From the svm module, you can import the **svc** class, which stands for Support Vector Classification. It is a supervised learning algorithm that is widely used for *classification* tasks. It is capable of performing binary classification, as well as multiclass classification through one-vs-one or one-vs-rest strategies.

From the svc class, you can import various functions for training and prediction tasks. Some commonly used functions include:

*   **fit(X, y)**: Trains the SVC model on the input data X and target labels y.
*   **predict(X)**: Predicts the target labels for the input data X.
*   **decision_function(X)**: Predicts the decision function values for the input data X.
*   **score(X, y)**: Computes the mean accuracy of the SVC model on the given test data and labels.



In [2]:
#Import svm module from scikitlearn
from sklearn import svm

# Import SVC class from svm module
from sklearn.svm import SVC

# Create an instance of SVC
svc_classifier = SVC()

# Now, svc_classifier can be used to fit the model to data and make predictions

###  Loading Example Datasets:
scikit-learn comes with a few standard datasets, for instance the **iris** and **digits** datasets for classification and the **diabetes** dataset for regression.


In [2]:
# Import the datasets module from scikit-learn
from sklearn import datasets

# Load the Iris dataset
iris = datasets.load_iris()


### Loading External Datasets:

In Google Colab, you can upload files directly to the runtime environment.
Click on the "Files" tab on the left sidebar in Google Colab.
Click on the "Upload to session storage" icon.
Select the file from your local system and upload it.


For example, after uploading 'payment_fraud.csv', you can read it into a DataFrame using pd.read_csv('payment_fraud.csv').

In [4]:
# Use pandas to load CSV data
import pandas as pd
data = pd.read_csv('payment_fraud.csv')


### Accessing Dataset Attributes:
A dataset is a dictionary-like object that holds all the data and some metadata about the data. This data is stored in the .data member, which is a n_samples, n_features array. In the case of supervised problems, one or more response variables are stored in the .target member.

In [5]:
print(iris.data)
print(iris.target)



[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.

###  Creating Estimator:
**Purpose:** Create an estimator for classification using Support Vector Classification (SVC).

gamma: This parameter controls the influence of a single training example. A low value of gamma means the decision boundary will be smoother, while a high value makes the decision boundary more jagged.
C: This parameter controls the trade-off between achieving a low training error and keeping the decision boundary as large as possible. A small value of C makes the decision boundary smoother, while a large value allows the decision boundary to fit the training data more closely, possibly resulting in overfitting.

So, when you execute clf = svm.SVC(gamma=0.001, C=100.), you're creating an instance of the SVC classifier with the specified parameter values for gamma and C. This instance (clf) can then be used to fit the model to training data and make predictions on new data.


In [6]:
clf = svm.SVC(gamma=0.001, C=100.)


###  Fitting Estimator:
**Purpose:** Fit the estimator to the training data.


In [8]:
clf.fit(iris.data[:-1], iris.target[:-1])


###  Predicting with Estimator:
**Purpose:** Use the fitted estimator to make predictions.


In [9]:
prediction = clf.predict(iris.data[-1:])


###  Setting Estimator Parameters:
**Purpose:** Set parameters for the estimator.

In [10]:
clf.set_params(kernel='linear')
clf.set_params(kernel='rbf')


###  Updating Hyperparameters:
**Purpose:** Hyper-parameters of an estimator can be updated after it has been constructed via the set_params() method. Calling fit() more than once will overwrite what was learned by any previous fit(). Here, the default kernel rbf is first changed to linear via SVC.set_params() after the estimator has been constructed, and changed back to rbf to refit the estimator and to make a second prediction.



In [16]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import SVC
X, y = load_iris(return_X_y=True)

clf = SVC()
clf.set_params(kernel='linear').fit(X, y)
SVC(kernel='linear')
clf.predict(X[:5])

clf.set_params(kernel='rbf').fit(X, y)
SVC()
clf.predict(X[:5])



array([0, 0, 0, 0, 0])

###  Handling Multiclass/Multilabel Classification:
**Purpose:** Handle multiclass or multilabel classification scenarios.

In [None]:
OneVsRestClassifier(estimator=SVC(random_state=0))
LabelBinarizer().fit_transform(y)
MultiLabelBinarizer().fit_transform(y)


### Learning and Predicting:
**Purpose:** Explain the concept of learning and predicting using an example.


In [None]:
# Fitting a model and making predictions
model.fit(X_train, y_train)
predictions = model.predict(X_test)


### Choosing Model Parameters:
**Purpose:** Discuss the importance of choosing model parameters.


In [None]:
# Using GridSearchCV to find the best parameters
from sklearn.model_selection import GridSearchCV
params = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
grid_search = GridSearchCV(SVC(), params, cv=5)


### Refitting and Updating Parameters:
**Purpose:** Explain how to update hyperparameters and refit the model.


In [None]:
# Updating hyperparameters and refitting the model
clf.set_params(C=10).fit(X_train, y_train)


### Multiclass vs. Multilabel Fitting:
**Purpose:** Differentiate between multiclass and multilabel fitting.


In [None]:
# Multiclass classification
OneVsRestClassifier(estimator=SVC(random_state=0))

# Multilabel classification
LabelBinarizer().fit_transform(y)


### Type casting:
**Purpose:** Explain how input data types are handled in scikit-learn.


In [None]:
# Type casting example
import numpy as np
from sklearn import kernel_approximation

X = np.random.rand(10, 2000)
X = np.array(X, dtype='float32')
X.dtype
transformer = kernel_approximation.RBFSampler()
X_new = transformer.fit_transform(X)
X_new.dtype
