> # <b>TLDR Notes:</b> Understanding ML, workflow and tech stack, step-by-step.

# Data Collection

#### Data collection is the first and foundational step in the machine learning pipeline. It involves gathering `raw data` from various sources which could include `sensors, logs, databases, datasets, user inputs,` or `online repositories`. The quality and quantity of collected data can significantly influence the performance of ML models. Techniques range from `simple data scraping` to `complex data streaming`, and technologies often used include SQL for databases, APIs for web services, and specialized hardware for sensors.

# Data Preparation

#### This process involves cleaning and transforming raw data into a format that ML algorithms can understand. It's about `handling missing values`, `encoding categorical variables`, `normalizing` or `scaling numerical values`, and `feature engineering`. Tools like `pandas` in Python are commonly used. Proper data preparation can greatly enhance model accuracy and is often considered the most time-consuming part of the ML process.

# Machine Learning Algorithms
#### These are the set of rules and techniques that allow computers to find patterns and make decisions based on data. From simple `linear regression` and `decision trees` to `complex neural networks` and `ensemble methods`, each algorithm has its strengths and is chosen based on the specific problem and data type. Libraries like [scikit-learn](https://scikit-learn.org/stable/), [TensorFlow](https://www.tensorflow.org), and [PyTorch](https://pytorch.org) provide implementations of these algorithms.

# Data Visualization in ML
#### It's a powerful practice to explore and communicate data insights through graphical representation. Tools like [Matplotlib](https://matplotlib.org), [Seaborn](https://seaborn.pydata.org), and [Plotly](https://plotly.com) in Python, or ggplot2 in R, are used to create charts, graphs, and interactive plots. Good visualization helps in understanding complex data, detecting outliers, errors, and patterns, and is crucial for communicating findings to stakeholders.


# Python Programming for ML
#### Python is a versatile, high-level language widely adopted in machine learning for its simplicity and the extensive availability of libraries and frameworks. Libraries like [NumPy](https://numpy.org) for numerical operations, [pandas](https://pandas.pydata.org) for data manipulation, [scikit-learn](https://scikit-learn.org/stable/) for machine learning, and [TensorFlow](https://www.tensorflow.org) and [PyTorch](https://pytorch.org) for deep learning, form the core stack for ML in Python.


# Jupyter Notebooks for ML
#### Jupyter Notebooks offer an interactive coding environment where you can write and execute code, visualize data, and document the process using Markdown. It's highly favored for ML projects due to its ability to combine code, output, and annotations into a single document, making it ideal for experiments, exploratory data analysis, and educational purposes.

# ML Tech Stack

- #### `NumPy` is used for handling numerical operations. Its array objects are much faster and compact than traditional Python lists. NumPy arrays form the core structure that pandas and other libraries build upon.

- #### `Pandas` is utilized for data manipulation and analysis. It offers data structures like DataFrames, which make it easy to load, manage, and manipulate tabular data with ease. Pandas is typically used for data cleaning, filtration, and transformation tasks.

- #### `Matplotlib` is employed for creating static, interactive, and animated visualizations in Python. It's useful for plotting graphs and charts, which are essential for data exploration and results presentation.

- #### Libraries like `Seaborn build on Matplotlib`, offering a higher-level interface for drawing attractive and informative statistical graphics. It's often used alongside pandas for seamless data visualization tasks.

- #### `Scikit-learn` is the go-to library for classical machine learning algorithms. It's used for tasks ranging from preprocessing data, feature extraction, and modeling with algorithms like linear regression, decision trees, and clustering.

- #### `TensorFlow` is a powerful library for deep learning developed by Google. It's used extensively for constructing and training neural networks with large datasets, often for applications in image and speech recognition, natural language processing, and more.

- #### `PyTorch`, created by Meta, is another library for deep learning. It's known for its flexibility and dynamic computational graph, which is particularly friendly for research and development. It allows for easy and fast adjustments to neural networks, making it a preferred choice for experimentation.