# Module 1

* Artificial intelligence (AI) simulates human cognition, while machine learning (ML) uses algorithms and requires feature engineering to learn from data.

* Machine learning includes different types of models: supervised learning, which uses labeled data to make predictions; unsupervised learning, which finds patterns in unlabeled data; and semi-supervised learning, which trains on a small subset of labeled data.

* Key factors for choosing a machine learning technique include the type of problem to be solved, the available data, available resources, and the desired outcome.

* Machine learning techniques include anomaly detection for identifying unusual cases like fraud, classification for categorizing new data, regression for predicting continuous values, and clustering for grouping similar data points without labels.

* Machine learning tools support pipelines with modules for data preprocessing, model building, evaluation, optimization, and deployment.

* R is commonly used in machine learning for statistical analysis and data exploration, while Python offers a vast array of libraries for different machine learning tasks. Other programming languages used in ML include Julia, Scala, Java, and JavaScript, each suited to specific applications like high-performance computing and web-based ML models.

* Data visualization tools such as Matplotlib and Seaborn create customizable plots, ggplot2 enables building graphics in layers, and Tableau provides interactive data dashboards.

* Python libraries commonly used in machine learning include NumPy for numerical computations, Pandas for data analysis and preparation, SciPy for scientific computing, and Scikit-learn for building traditional machine learning models.

* Deep learning frameworks such as TensorFlow, Keras, Theano, and PyTorch support the design, training, and testing of neural networks used in areas like computer vision and natural language processing.

* Computer vision tools enable applications like object detection, image classification, and facial recognition, while natural language processing (NLP) tools like NLTK, TextBlob, and Stanza facilitate text processing, sentiment analysis, and language parsing.

* Generative AI tools use artificial intelligence to create new content, including text, images, music, and other media, based on input data or prompts.

* Scikit-learn provides a range of functions, including classification, regression, clustering, data preprocessing, model evaluation, and exporting models for production use.

* The machine learning ecosystem includes a network of tools, frameworks, libraries, platforms, and processes that collectively support the development and management of machine learning models.

# Module 2

* Regression models relationships between a continuous target variable and explanatory features, covering simple and multiple regression types.

* Simple regression uses a single independent variable to estimate a dependent variable, while multiple regression involves more than one independent variable.

* Regression is widely applicable, from forecasting sales and estimating maintenance costs to predicting rainfall and disease spread.

* In simple linear regression, a best-fit line minimizes errors, measured by Mean Squared Error (MSE); this approach is known as Ordinary Least Squares (OLS).

* OLS regression is easy to interpret but sensitive to outliers, which can impact accuracy.

* Multiple linear regression extends simple linear regression by using multiple variables to predict outcomes and analyze variable relationships.

* Adding too many variables can lead to overfitting, so careful variable selection is necessary to build a balanced model.

* Nonlinear regression models complex relationships using polynomial, exponential, or logarithmic functions when data does not fit a straight line.

* Polynomial regression can fit data but mayoverfit by capturing random noise rather than underlying patterns.

* Logistic regression is a probability predictor and binary classifier, suitable for binary targets and assessing feature impact.

* Logistic regression minimizes errors using log-loss and optimizes with gradient descent or stochastic gradient descent for efficiency.

* Gradient descent is an iterative process to minimize the cost function, which is crucial for training logistic regression models.

In [4]:
from IPython.display import display, HTML
display(HTML('C:\\Users\\Gamaliel\\Documents\\G\\ADD\\IBM_DS\\ML_Py\\M01\\01.Regression.htm'))

Model Name,Description,Code Syntax
Simple linear regression,"Purpose: To predict a dependent variable based on one independent variable.  Pros: Easy to implement, interpret, and efficient for small datasets.  Cons: Not suitable for complex relationships; prone to underfitting.  Modeling equation: y = b0 + b1x","123from sklearn.linear_model import LinearRegressionmodel = LinearRegression()model.fit(X, y)  Copied!  Wrap Toggled!"
Polynomial regression,Purpose: To capture nonlinear relationships between variables.  Pros: Better at fitting nonlinear data compared to linear regression.  Cons: Prone to overfitting with high-degree polynomials.  Modeling equation: y = b0 + b1x + b2x2 + ...,"12345from sklearn.preprocessing import PolynomialFeaturesfrom sklearn.linear_model import LinearRegressionpoly = PolynomialFeatures(degree=2)X_poly = poly.fit_transform(X)model = LinearRegression().fit(X_poly, y)  Copied!  Wrap Toggled!"
Multiple linear regression,Purpose: To predict a dependent variable based on multiple independent variables.  Pros: Accounts for multiple factors influencing the outcome.  Cons: Assumes a linear relationship between predictors and target.  Modeling equation: y = b0 + b1x1 + b2x2 + ...,"123from sklearn.linear_model import LinearRegressionmodel = LinearRegression()model.fit(X, y)  Copied!  Wrap Toggled!"
Logistic regression,Purpose: To predict probabilities of categorical outcomes.  Pros: Efficient for binary classification problems.  Cons: Assumes a linear relationship between independent variables and log-odds.  Modeling equation: log(p/(1-p)) = b0 + b1x1 + ...,"123from sklearn.linear_model import LogisticRegressionmodel = LogisticRegression()model.fit(X, y)  Copied!  Wrap Toggled!"

Function/Method Name,Brief Description,Code Syntax
train_test_split,Splits the dataset into training and testing subsets to evaluate the model's performance.,"12from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  Copied!  Wrap Toggled!"
StandardScaler,Standardizes features by removing the mean and scaling to unit variance.,123from sklearn.preprocessing import StandardScalerscaler = StandardScaler()X_scaled = scaler.fit_transform(X)  Copied!  Wrap Toggled!
log_loss,"Calculates the logarithmic loss, a performance metric for classification models.","12from sklearn.metrics import log_lossloss = log_loss(y_true, y_pred_proba)  Copied!  Wrap Toggled!"
mean_absolute_error,Calculates the mean absolute error between actual and predicted values.,"12from sklearn.metrics import mean_absolute_errormae = mean_absolute_error(y_true, y_pred)  Copied!  Wrap Toggled!"
mean_squared_error,Computes the mean squared error between actual and predicted values.,"12from sklearn.metrics import mean_squared_errormse = mean_squared_error(y_true, y_pred)  Copied!  Wrap Toggled!"
root_mean_squared_error,"Calculates the root mean squared error (RMSE), a commonly used metric for regression tasks.","123from sklearn.metrics import mean_squared_errorimport numpy as nprmse = np.sqrt(mean_squared_error(y_true, y_pred))  Copied!  Wrap Toggled!"
r2_score,"Computes the R-squared value, indicating how well the model explains the variability of the target variable.","12from sklearn.metrics import r2_scorer2 = r2_score(y_true, y_pred)  Copied!  Wrap Toggled!"
