# Title
Tools for Data Science Final Project

# Introduction to Python Jupyter Notebook

Welcome to this Jupyter Notebook, where we will explore the world of Python programming! Jupyter Notebook is an interactive environment that allows you to write and execute code, visualize data, and document your analysis in a single document. It's widely used by data scientists, researchers, and educators to perform data analysis, machine learning experiments, and much more.

## What is Python?

Python is a versatile and powerful programming language that's known for its simplicity and readability. It's widely used in various domains such as web development, scientific computing, data analysis, artificial intelligence, and more. In this notebook, we'll primarily focus on using Python for data analysis and basic programming tasks.

## How to Use This Notebook

Each cell in this notebook can contain either code (Python in our case) or Markdown text (like this cell). To run a code cell, simply click on it to select it and then press `Shift + Enter` or click the "Run" button in the toolbar above. The output, if any, will appear below the cell.

Feel free to modify and experiment with the code cells. Don't hesitate to break things – that's how we learn! If you ever encounter any issues, you can always restart the notebook by clicking on "Kernel" > "Restart" in the menu.

## Getting Started

Before we dive into coding, let's ensure you have everything set up. If you're new to Jupyter Notebook, it's recommended to have a basic understanding of Python syntax. If you're not familiar with Python, don't worry – we'll start with the basics and gradually build up our knowledge.

Throughout this notebook, we'll cover:
- Printing and variables
- Data types and data structures
- Control flow (if statements, loops)
- Functions and libraries

Let's begin our Python journey!


## Data Science Languages in Jupyter Notebook

Jupyter Notebook is a popular platform for data scientists to perform various data analysis and machine learning tasks. While it supports multiple programming languages, Python is the most widely used language for data science within the Jupyter environment.

### Python

Python is the dominant language for data science in Jupyter Notebook due to its extensive libraries, vibrant community, and ease of use. Some key libraries and frameworks for data science in Python include:

- **NumPy:** A fundamental package for numerical computations in Python.
- **Pandas:** A library for data manipulation and analysis.
- **Matplotlib:** A 2D plotting library for creating visualizations.
- **Seaborn:** A statistical data visualization library based on Matplotlib.
- **Scikit-learn:** A machine learning library with various algorithms and tools.
- **TensorFlow and PyTorch:** Deep learning frameworks for building and training neural networks.

### R

While Python is the most popular language, R is another commonly used language in data science and statistics. Jupyter Notebook also supports R, and it has its own set of packages and libraries for data analysis and visualization.

- **dplyr:** A package for data manipulation and transformation.
- **ggplot2:** A powerful package for creating customized data visualizations.
- **caret:** A package for machine learning and model training.
- **tidyr:** A package for tidying and organizing data.

### Julia

Julia is a high-level, high-performance programming language for technical computing. It's gaining popularity in the data science community due to its speed and ability to write efficient code. Jupyter Notebook supports Julia, making it accessible for data analysis tasks.

- **DataFrames.jl:** A package for working with tabular data.
- **Plots.jl:** A plotting package for creating visualizations.
- **StatsBase.jl:** A package for basic statistics and data manipulation.

Remember that Jupyter Notebook's versatility allows you to integrate multiple languages within a single notebook, enabling you to choose the best language for specific tasks.


# Popular Data Science Libraries in Python

Python is widely used in data science and machine learning, thanks to its rich ecosystem of libraries. Here are some popular ones:

## Data Manipulation and Analysis
- **Pandas**: A powerful library for data manipulation and analysis, providing data structures like DataFrames.
- **NumPy**: The fundamental package for scientific computing with support for arrays and matrices.

## Visualization
- **Matplotlib**: A 2D plotting library that produces high-quality figures and visualizations.
- **Seaborn**: Built on top of Matplotlib, Seaborn provides a higher-level interface for creating attractive statistical visualizations.
- **Plotly**: Interactive visualizations and dashboards for web-based exploration.

## Machine Learning
- **Scikit-learn**: A simple and efficient tool for machine learning, including classification, regression, clustering, and more.
- **TensorFlow**: An open-source deep learning framework developed by Google.
- **PyTorch**: A popular deep learning framework known for its dynamic computational graph and user-friendly interface.

## Natural Language Processing (NLP)
- **NLTK**: The Natural Language Toolkit, a library for working with human language data.
- **spaCy**: Industrial-strength NLP library for a variety of NLP tasks.
- **Transformers**: Hugging Face's library for state-of-the-art NLP using pre-trained transformer models.

## Data Visualization and Exploration
- **Bokeh**: Interactive visualizations for modern web browsers.
- **Altair**: A declarative statistical visualization library.
- **Folium**: Create interactive leaflet maps.

## Data Cleaning and Preprocessing
- **Scrapy**: A framework for web scraping.
- **Beautiful Soup**: Library for pulling data out of HTML and XML files.
- **Feature-engine**: A library for feature engineering in machine learning pipelines.

## Time Series Analysis
- **Statsmodels**: A library for estimating and interpreting models for many different statistical needs.
- **Prophet**: An open-source forecasting tool by Facebook for time series data.

## Data I/O and Storage
- **HDF5**: A data model, library, and file format for storing and managing large amounts of numerical data.
- **SQLAlchemy**: SQL toolkit and Object-Relational Mapping (ORM) library.

Remember to install these libraries using `pip` before using them in your Jupyter Notebook:
```bash
pip install pandas numpy matplotlib seaborn scikit-learn tensorflow torch nltk spacy plotly bokeh altair folium scrapy beautifulsoup4 feature-engine statsmodels prophet

## Data Science Tools in Python

| Tool               | Description                                               | Website                                |
|-------------------|-----------------------------------------------------------|----------------------------------------|
| NumPy             | Numerical computing library for arrays and matrices       | [Link](https://numpy.org/)             |
| pandas            | Data manipulation and analysis library                    | [Link](https://pandas.pydata.org/)     |
| Matplotlib        | Plotting and data visualization library                   | [Link](https://matplotlib.org/)        |
| Seaborn           | Statistical data visualization based on Matplotlib        | [Link](https://seaborn.pydata.org/)    |
| Scikit-learn      | Machine learning library for classification, regression, clustering, etc. | [Link](https://scikit-learn.org/) |
| TensorFlow        | Open-source machine learning framework                    | [Link](https://www.tensorflow.org/)    |
| PyTorch           | Deep learning framework                                   | [Link](https://pytorch.org/)           |
| Jupyter Notebook | Interactive computing environment for data analysis       | [Link](https://jupyter.org/)          |
| Spyder            | Integrated development environment for data science       | [Link](https://www.spyder-ide.org/)   |
| Statsmodels       | Statistical modeling library                              | [Link](https://www.statsmodels.org/)  |

# Arithmetic Expressions in Python

Arithmetic expressions are fundamental mathematical operations that can be performed using Python. These expressions allow you to perform calculations involving numbers, variables, and operators. Python supports a variety of arithmetic operators for addition, subtraction, multiplication, division, and more.

## Basic Arithmetic Operators

Here are some examples of basic arithmetic operations using Python:

1. Addition: `2 + 3` equals 5
2. Subtraction: `10 - 5` equals 5
3. Multiplication: `4 * 6` equals 24
4. Division: `15 / 3` equals 5.0 (floating-point result)
5. Integer Division: `15 // 3` equals 5 (integer result)
6. Modulus (Remainder): `17 % 5` equals 2

## Operator Precedence

Python follows the standard rules of operator precedence when evaluating arithmetic expressions. Here's a quick reminder of the precedence order:

1. Parentheses `()`
2. Exponentiation `**`
3. Multiplication `*`, Division `/`, Integer Division `//`, Modulus `%`
4. Addition `+`, Subtraction `-`

Remember that you can use parentheses to control the order of operations and create more complex expressions.

## Variables in Arithmetic Expressions

You can also use variables in arithmetic expressions:

```python
x = 10
y = 3
result = x + y * 2  # Evaluates to 16


In [None]:
# Multiply and Add Numbers
num1 = 5
num2 = 10

# Multiply
product = num1 * num2

# Add
sum_result = num1 + num2

# Display results
print("Product:", product)
print("Sum:", sum_result)

In [1]:
# Input: minutes
minutes = 150

# Convert minutes to hours
hours = minutes / 60

# Display the result
print(f"{minutes} minutes is equal to {hours:.2f} hours")


150 minutes is equal to 2.50 hours


## Objectives

- Learn about Python programming language
- Understand Jupyter Notebooks and their features
- Explore data manipulation and analysis
- Practice data visualization with matplotlib
- Gain knowledge about machine learning basics
- Apply machine learning algorithms to real datasets

#Author's Name:
Author: Ryan Craft