**16. Procedure for Implementation of sklearn**

** Procedure for Implementation of sklearn**
machine learning models using scikit-learn, a powerful Python library for machine learning.

**Installation:**
Ensure you have Python 3.8 or newer installed.
Make sure you have NumPy and SciPy installed (they are dependencies for scikit-learn).
To install scikit-learn, use either of the following commands:
Using pip:
!pip install -U scikit-learn

**Using conda:**
conda install -c conda-forge scikit-learn

**Loading a Dataset:**

**A dataset consists of two main components:**

**Features:**
These are the variables (predictors or attributes) of your data. They can be represented by a feature matrix (commonly denoted as ‘X’).

**Response:**
This is the output variable that depends on the feature variables. It is represented by a response vector (commonly denoted as ‘y’).

Scikit-learn comes with example datasets like iris
(for classification) and boston house prices (for regression). You can load them as follows:

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data  # Feature matrix
y = iris.target  # Response vector
feature_names = iris.feature_names
target_names = iris.target_names
print("Feature names:", feature_names)
print("Target names:", target_names)
print("\nType of X is:", type(X))
print("\nFirst 5 rows of X:\n", X[:5])


Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Target names: ['setosa' 'versicolor' 'virginica']

Type of X is: <class 'numpy.ndarray'>

First 5 rows of X:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]


If you want to load an external dataset, you can use the pandas library for easy data manipulation.

**Model Building:**
Scikit-learn provides a unified interface for various machine learning
algorithms,
including classification,
regression, and clustering.

To build a model,
choose an appropriate algorithm
(e.g., SVM, random forests, k-means, etc.)
and follow these steps:

**Initialize the estimator:**
For example, from sklearn.svm import SVC for support vector classification.
**Fit the model:** Use estimator.fit(X, y) to train the model.

**Make predictions:** Use estimator.predict(T) to predict outcomes for new data.

Remember, scikit-learn simplifies the process of building and evaluating machine learning models, making it accessible to everyone!# New Section

**17. Write a Python program using Scikit-learn to split the dataset –
prediction.csv**

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Load the CSV file
df = pd.read_csv("prediction.csv")


from sklearn.model_selection import train_test_split

# Assuming 'X' contains features and 'y' contains the target variable
X = df.drop(columns=["bmi"])  # Replace "target_column" with your actual target column
y = df["bmi"]

# Split the data (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=42)

# Now you can use X_train, y_train for training and X_test, y_test for testing


**18.Write a Python program using Scikit-learn to split the dataset Advertising.csv**

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Read the dataset from the CSV file
data = pd.read_csv('Advertising.csv')

# Split the data into features (X) and target (y)
X = data.drop(columns=['Sales'])  # Assuming 'Sales' is the target variable
y = data['Sales']

# Split the data into 70% training and 30% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Print the shapes of the resulting datasets
print(f"Training data shape: X_train={X_train.shape}, y_train={y_train.shape}")
print(f"Testing data shape: X_test={X_test.shape}, y_test={y_test.shape}")

Training data shape: X_train=(140, 4), y_train=(140,)
Testing data shape: X_test=(60, 4), y_test=(60,)


**19. Write a Python program using Scikit-learn to split the dataset - iris.csv**

In [5]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()

# Split the data into features (X) and target (y)
X = iris.data
y = iris.target

# Split the data into 50% training and 50% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

# Print the shapes of the resulting datasets
print(f"Training data shape: X_train={X_train.shape}, y_train={y_train.shape}")
print(f"Testing data shape: X_test={X_test.shape}, y_test={y_test.shape}")



Training data shape: X_train=(75, 4), y_train=(75,)
Testing data shape: X_test=(75, 4), y_test=(75,)


In [8]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Read the dataset from the CSV file
df = pd.read_csv('Real estate.csv')

# Assuming 'Price' is the target variable, split the data
X = df.drop(columns=['price'])
y = df['price']

# Split the data into 70% training and 30% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Print the shapes of the resulting datasets
print(f"Training data shape: X_train={X_train.shape}, y_train={y_train.shape}")
print(f"Testing data shape: X_test={X_test.shape}, y_test={y_test.shape}")


Training data shape: X_train=(289, 7), y_train=(289,)
Testing data shape: X_test=(125, 7), y_test=(125,)
