# Overview of Scikit-Learn

Scikit-Learn is a robust, user-friendly machine learning library in Python that provides simple and efficient tools for data analysis and modeling.

1. **Versatility and Coverage:**
   - Scikit-Learn covers a wide range of machine learning algorithms, from classical models like linear regression and support vector machines to ensemble methods such as random forests and gradient boosting.
   - It supports both supervised and unsupervised learning, making it suitable for tasks such as classification, regression, clustering, and dimensionality reduction.

2. **Ease of Use and Integration:**
   - The library is designed with a clean and consistent API, making it straightforward to use and integrate with other data science tools.
   - It seamlessly integrates with other scientific computing libraries such as NumPy, SciPy, and Pandas, facilitating smooth data manipulation and analysis workflows.

3. **Community and Documentation:**
   - Scikit-Learn has a large and active community, providing extensive documentation, tutorials, and examples that can help beginners and experienced users alike.

### Applications in Data Science

Scikit-Learn is employed in various data science applications, from predictive analytics to model evaluation and tuning.

1. **Predictive Analytics:**
   - It is widely used for building predictive models in areas like finance, healthcare, marketing, and more.
   - Examples include predicting stock prices, diagnosing diseases, and segmenting customer bases.

2. **Data Preprocessing and Feature Engineering:**
   - Scikit-Learn offers robust tools for data preprocessing, such as scaling, normalization, and imputation.
   - Feature engineering tools help create new features or transform existing ones to improve model performance.

3. **Model Evaluation and Tuning:**
   - The library provides comprehensive methods for evaluating model performance through cross-validation, metrics, and scoring functions.
   - Hyperparameter tuning can be efficiently performed using grid search and random search techniques.

### Getting Started with Scikit-Learn

Starting with Scikit-Learn involves setting up your environment, understanding the core API, and practicing with basic examples.

1. **Environment Setup:**
   - Install Scikit-Learn using package managers like pip or conda:
     ```bash
     pip install scikit-learn
     ```
   - Ensure you have Python, NumPy, SciPy, and Pandas installed.

2. **Understanding the Core API:**
   - Scikit-Learn’s API follows a consistent structure with `fit()`, `predict()`, and `transform()` methods.
   - Example of training a simple model:
     ```python
     from sklearn.model_selection import train_test_split
     from sklearn.linear_model import LinearRegression
     from sklearn import datasets

     # Load dataset
     data = datasets.load_boston()
     X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2)

     # Initialize and train model
     model = LinearRegression()
     model.fit(X_train, y_train)

     # Make predictions
     predictions = model.predict(X_test)
     ```

3. **Practicing with Examples:**
   - Examples and documentation available on the [Scikit-Learn website](https://scikit-learn.org/stable/).

In [1]:
import pandas as pd

data = "https://raw.githubusercontent.com/carlosfab/dsnp2/master/datasets/heart-disease-uci.csv"

df = pd.read_csv(data)
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num
0,63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0
1,67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,2
2,67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1
3,37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0
4,41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0


Each row indicates a single observation, while columns represent variables (*features*). To create a regression model we must split the DataFrame into a matrix containing the independent variables (*features*) and a target vector containing the dependent variable. 