<a href="https://colab.research.google.com/github/mellow-steps/S576J-Graduate-Certificate-Of-Data-Analytics/blob/main/PassTask3_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Collection of scikit-learn commands for machine learning



###  Importing Modules:
**Purpose:** Import necessary modules from scikit-learn.

datasets: This module provides access to various datasets that are commonly used for machine learning tasks.

svm: This module contains implementations of Support Vector Machine (SVM) algorithms, which are powerful supervised learning models used for classification and regression tasks.

SVC: This class implements the Support Vector Classification (SVC) algorithm, which is a type of SVM algorithm used for classification tasks. SVC is effective in separating classes by finding the optimal hyperplane that maximizes the margin between classes in the feature space.

In [6]:
from sklearn import datasets
from sklearn import svm
from sklearn.svm import SVC


###  Loading Example Datasets:
**Purpose:** Load example datasets provided by scikit-learn.


In [7]:
iris = datasets.load_iris()
digits = datasets.load_digits()


### Loading External Datasets:
**Purpose:** Load datasets from external sources.

In Google Colab, you can upload files directly to the runtime environment.
Click on the "Files" tab on the left sidebar in Google Colab.
Click on the "Upload to session storage" icon.
Select the file from your local system and upload it.
Once the file is uploaded, you can access it using its filename directly in your code. There's no need to specify a path, as the file is uploaded to the current working directory.

For example, after uploading 'payment_fraud.csv', you can read it into a DataFrame using pd.read_csv('payment_fraud.csv').

In [10]:
# Use pandas to load CSV data
import pandas as pd
data = pd.read_csv('payment_fraud.csv')


### Accessing Dataset Attributes:
**Purpose:** Access the attributes of the loaded datasets.

In [None]:
print(iris.data)
print(iris.target)



###  Creating Estimator:
**Purpose:** Create an estimator for classification using Support Vector Classification (SVC).

gamma: This parameter controls the influence of a single training example. A low value of gamma means the decision boundary will be smoother, while a high value makes the decision boundary more jagged.
C: This parameter controls the trade-off between achieving a low training error and keeping the decision boundary as large as possible. A small value of C makes the decision boundary smoother, while a large value allows the decision boundary to fit the training data more closely, possibly resulting in overfitting.

So, when you execute clf = svm.SVC(gamma=0.001, C=100.), you're creating an instance of the SVC classifier with the specified parameter values for gamma and C. This instance (clf) can then be used to fit the model to training data and make predictions on new data.


In [None]:
clf = svm.SVC(gamma=0.001, C=100.)


###  Fitting Estimator:
**Purpose:** Fit the estimator to the training data.


In [None]:
clf.fit(digits.data[:-1], digits.target[:-1])


###  Predicting with Estimator:
**Purpose:** Use the fitted estimator to make predictions.


In [None]:
prediction = clf.predict(digits.data[-1:])


###  Setting Estimator Parameters:
**Purpose:** Set parameters for the estimator.

In [None]:
clf.set_params(kernel='linear')
clf.set_params(kernel='rbf')


###  Updating Hyperparameters:
**Purpose:** Update hyperparameters of the estimator.

In [None]:
clf.set_params(kernel='linear').fit(X, y)
clf.set_params(kernel='rbf').fit(X, y)


###  Handling Multiclass/Multilabel Classification:
**Purpose:** Handle multiclass or multilabel classification scenarios.

In [None]:
OneVsRestClassifier(estimator=SVC(random_state=0))
LabelBinarizer().fit_transform(y)
MultiLabelBinarizer().fit_transform(y)


### Learning and Predicting:
**Purpose:** Explain the concept of learning and predicting using an example.


In [None]:
# Fitting a model and making predictions
model.fit(X_train, y_train)
predictions = model.predict(X_test)


### Choosing Model Parameters:
**Purpose:** Discuss the importance of choosing model parameters.


In [None]:
# Using GridSearchCV to find the best parameters
from sklearn.model_selection import GridSearchCV
params = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
grid_search = GridSearchCV(SVC(), params, cv=5)


### Refitting and Updating Parameters:
**Purpose:** Explain how to update hyperparameters and refit the model.


In [None]:
# Updating hyperparameters and refitting the model
clf.set_params(C=10).fit(X_train, y_train)


### Multiclass vs. Multilabel Fitting:
**Purpose:** Differentiate between multiclass and multilabel fitting.


In [None]:
# Multiclass classification
OneVsRestClassifier(estimator=SVC(random_state=0))

# Multilabel classification
LabelBinarizer().fit_transform(y)


### Type casting:
**Purpose:** Explain how input data types are handled in scikit-learn.


In [None]:
# Type casting example
import numpy as np
from sklearn import kernel_approximation

X = np.random.rand(10, 2000)
X = np.array(X, dtype='float32')
X.dtype
transformer = kernel_approximation.RBFSampler()
X_new = transformer.fit_transform(X)
X_new.dtype
