# Complete Data Science Environment Setup Guide

**Welcome to Data Science!** 

This notebook will guide you through setting up a complete data science development environment on your computer. By the end of this guide, you'll have all the essential tools and libraries needed to start your data science journey.

## What You'll Install

1. **Anaconda** - Python distribution with package management
2. **Python** - Programming language (comes with Anaconda)
3. **VS Code** - Modern code editor
4. **Git** - Version control system
5. **Essential Libraries** - NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn, TensorFlow

---

In [1]:
2+3
5+7

12

## Part 1: Installing Anaconda

### What is Anaconda?

Anaconda is a distribution of Python and R for scientific computing and data science. It includes:
- Python interpreter
- Over 250+ pre-installed packages
- Conda package manager
- Jupyter Notebook
- Environment management tools

### Installation Steps

#### For Windows:

1. **Download Anaconda**
   - Visit: https://www.anaconda.com/download
   - Download the Windows installer (64-bit recommended)
   - File size: ~500 MB

2. **Run the Installer**
   - Double-click the downloaded `.exe` file
   - Click "Next" to proceed
   - Accept the license agreement
   - Choose installation type:
     - **Recommended**: "Just Me" (no admin privileges needed)
   - Choose installation location:
     - Default: `C:\Users\YourName\anaconda3`

3. **Advanced Options**
   - ☑️ **Check**: "Add Anaconda to my PATH environment variable" (important!)
   - ☑️ **Check**: "Register Anaconda as my default Python"
   - Click "Install" (takes 5-10 minutes)

4. **Verify Installation**
   - Open **Anaconda Prompt** from Start Menu
   - Type: `conda --version`
   - Type: `python --version`

#### For macOS:

1. **Download Anaconda**
   - Visit: https://www.anaconda.com/download
   - Download the macOS installer (.pkg file)

2. **Install**
   - Double-click the downloaded `.pkg` file
   - Follow the installation wizard
   - Enter your password when prompted

3. **Verify Installation**
   - Open **Terminal**
   - Type: `conda --version`
   - Type: `python --version`

#### For Linux (Ubuntu/Debian):

1. **Download Anaconda**
   - Visit: https://www.anaconda.com/download
   - Download the Linux installer (.sh file)

2. **Install via Terminal**
   ```bash
   # Navigate to Downloads folder
   cd ~/Downloads
   
   # Make the installer executable
   chmod +x Anaconda3-*-Linux-x86_64.sh
   
   # Run the installer
   bash Anaconda3-*-Linux-x86_64.sh
   
   # Follow prompts, press ENTER to review license
   # Type 'yes' to accept license
   # Press ENTER to confirm installation location
   # Type 'yes' to initialize Anaconda
   ```

3. **Activate Changes**
   ```bash
   source ~/.bashrc
   ```

4. **Verify Installation**
   ```bash
   conda --version
   python --version
   ```

### Expected Output

```
conda 23.x.x
Python 3.11.x
```

**Congratulations!** Anaconda and Python are now installed.

## Part 2: Installing Visual Studio Code (VS Code)

### What is VS Code?

VS Code is a powerful, free code editor with excellent support for Python, Jupyter notebooks, and data science workflows.

### Installation Steps

#### For Windows:

1. **Download VS Code**
   - Visit: https://code.visualstudio.com/
   - Click "Download for Windows"

2. **Install**
   - Run the downloaded installer
   - Accept the agreement
   - **Important**: Check these options:
     - ☑️ Add "Open with Code" action to context menu
     - ☑️ Add to PATH
   - Click "Install"

#### For macOS:

1. **Download VS Code**
   - Visit: https://code.visualstudio.com/
   - Download for macOS

2. **Install**
   - Open the downloaded `.zip` file
   - Drag "Visual Studio Code" to Applications folder
   - Launch from Applications

#### For Linux:

```bash
# Using snap (Ubuntu)
sudo snap install code --classic

# Or download .deb package from https://code.visualstudio.com/
sudo dpkg -i code_*.deb
```

### Installing Essential VS Code Extensions

After launching VS Code:

1. Click the **Extensions** icon (or press `Ctrl+Shift+X`)
2. Search and install:
   - **Python** (by Microsoft) - Essential for Python development
   - **Jupyter** (by Microsoft) - For working with notebooks
   - **Pylance** (by Microsoft) - Enhanced Python language support
   - **GitLens** - Advanced Git integration (optional)

### Configure VS Code for Python

1. Open VS Code
2. Press `Ctrl+Shift+P` (or `Cmd+Shift+P` on Mac)
3. Type: "Python: Select Interpreter"
4. Choose the Anaconda Python interpreter (usually has 'anaconda3' in the path)

**VS Code is ready for data science!**

## Part 3: Installing Git

### What is Git?

Git is a version control system that tracks changes in your code, enables collaboration, and helps you manage different versions of your projects.

### Installation Steps

#### For Windows:

1. **Download Git**
   - Visit: https://git-scm.com/download/win
   - Download the installer

2. **Install**
   - Run the installer
   - **Recommended settings**:
     - Editor: Use Visual Studio Code as Git's default editor
     - PATH: Git from the command line and 3rd-party software
     - Line ending conversions: Checkout Windows-style, commit Unix-style
     - Terminal: Use MinTTY
   - Click "Install"

3. **Verify Installation**
   - Open Command Prompt or Git Bash
   - Type: `git --version`

#### For macOS:

**Option 1: Using Homebrew (Recommended)**
```bash
# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install Git
brew install git
```

**Option 2: Download Installer**
- Visit: https://git-scm.com/download/mac
- Download and run the installer

#### For Linux:

```bash
# Ubuntu/Debian
sudo apt update
sudo apt install git

# Fedora
sudo dnf install git

# Verify installation
git --version
```

### Configure Git

After installation, configure your identity:

```bash
# Set your name
git config --global user.name "Your Name"

# Set your email
git config --global user.email "your.email@example.com"

# Set default branch name
git config --global init.defaultBranch main

# Verify configuration
git config --list
```

**Git is installed and configured!**

## Part 4: Creating a Data Science Environment

### Why Create Environments?

Conda environments allow you to:
- Keep project dependencies isolated
- Avoid version conflicts between packages
- Easily share your setup with others
- Switch between different Python versions

### Creating Your First Environment

Open **Anaconda Prompt** (Windows) or **Terminal** (Mac/Linux) and run these commands:

```bash
# Create a new environment named 'datasci' with Python 3.10
conda create -n datasci python=3.10 -y
```

**What this does:**
- Creates an environment named `datasci`
- Installs Python 3.10
- `-y` flag automatically confirms the installation

### Environment Commands Reference

```bash
# Activate the environment
conda activate datasci

# Deactivate the current environment
conda deactivate

# List all environments
conda env list

# Remove an environment
conda env remove -n datasci

# Create environment from a file
conda env create -f environment.yml

# Export current environment
conda env export > environment.yml
```

### Activating Your Environment

**Always activate your environment before working:**

```bash
conda activate datasci
```

You'll see `(datasci)` appear at the beginning of your command prompt, indicating the environment is active.

```
(datasci) C:\Users\YourName>
```

## Part 5: Installing Essential Data Science Libraries

### Overview of Libraries

| Library | Purpose | Key Features |
|---------|---------|-------------|
| **NumPy** | Numerical computing | Arrays, matrices, mathematical functions |
| **Pandas** | Data manipulation | DataFrames, data cleaning, analysis |
| **Matplotlib** | Data visualization | Static plots, charts, graphs |
| **Seaborn** | Statistical visualization | Beautiful statistical graphics |
| **Scikit-learn** | Machine learning | Classification, regression, clustering |
| **TensorFlow** | Deep learning | Neural networks, deep learning models |

### Installation Commands

Make sure your `datasci` environment is activated, then run:

```bash
# Activate environment
conda activate datasci

# Install core libraries (NumPy, Pandas, Matplotlib, Seaborn)
conda install numpy pandas matplotlib seaborn -y

# Install Scikit-learn
conda install scikit-learn -y

# Install TensorFlow
conda install tensorflow -y
```

### Alternative: Install All at Once

```bash
# Install all essential libraries in one command
conda install numpy pandas matplotlib seaborn scikit-learn tensorflow -y
```

### Additional Useful Libraries

```bash
# Jupyter notebook and lab
conda install jupyter jupyterlab -y

# Data processing
conda install scipy openpyxl -y

# Interactive visualizations
conda install plotly -y

# Web scraping
conda install requests beautifulsoup4 -y

# SQL support
conda install sqlalchemy -y
```

### Using pip (Alternative Package Manager)

Some packages are better installed with pip:

```bash
# Activate environment first
conda activate datasci

# Install packages with pip
pip install kaggle  # Kaggle API
pip install streamlit  # For building data apps
pip install xgboost  # Gradient boosting library
```

## Part 6: Verifying Your Installation

Let's verify that all libraries are installed correctly. Run the code cells below to test each library.

### Check Library Versions

In [None]:
# Import libraries and check versions
import sys
import numpy as np
import pandas as pd
import matplotlib
import seaborn as sns
import sklearn
import tensorflow as tf

print("Python Version:", sys.version)
print("\nLibrary Versions:")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"Matplotlib: {matplotlib.__version__}")
print(f"Seaborn: {sns.__version__}")
print(f"Scikit-learn: {sklearn.__version__}")
print(f"TensorFlow: {tf.__version__}")

print("\n All libraries imported successfully!")

## Part 7: Troubleshooting Common Issues

### Issue 1: Conda command not found

**Solution:**
```bash
# Windows
# Add Anaconda to PATH or use Anaconda Prompt

# Mac/Linux
export PATH="/Users/YourName/anaconda3/bin:$PATH"
source ~/.bashrc
```

### Issue 2: Import errors after installation

**Solution:**
```bash
# Make sure environment is activated
conda activate datasci

# Reinstall the package
conda install --force-reinstall package_name
```

### Issue 3: TensorFlow installation fails

**Solution:**
```bash
# Try installing with pip instead
pip install tensorflow

# For GPU support (if you have NVIDIA GPU)
pip install tensorflow[and-cuda]
```

### Issue 4: Jupyter not finding environment

**Solution:**
```bash
# Install ipykernel in your environment
conda activate datasci
conda install ipykernel
python -m ipykernel install --user --name=datasci
```

### Issue 5: Permission errors (Mac/Linux)

**Solution:**
```bash
# Never use sudo with conda!
# Instead, fix ownership:
sudo chown -R $USER:$USER ~/anaconda3
```

### Issue 6: VS Code not detecting Python

**Solution:**
1. Install Python extension in VS Code
2. Press `Ctrl+Shift+P`
3. Type "Python: Select Interpreter"
4. Choose the interpreter from your conda environment

### Getting Help

- **Conda docs**: https://docs.conda.io/
- **Python docs**: https://docs.python.org/
- **Stack Overflow**: https://stackoverflow.com/
- **VS Code docs**: https://code.visualstudio.com/docs

## Part 8: Quick Reference Card

### Environment Commands

| Command | Description |
|---------|-------------|
| `conda create -n myenv python=3.10` | Create new environment |
| `conda activate myenv` | Activate environment |
| `conda deactivate` | Deactivate environment |
| `conda env list` | List all environments |
| `conda list` | List installed packages |
| `conda install package_name` | Install a package |
| `conda update package_name` | Update a package |
| `conda remove package_name` | Remove a package |

### Git Commands

| Command | Description |
|---------|-------------|
| `git init` | Initialize repository |
| `git status` | Check status |
| `git add .` | Stage all changes |
| `git commit -m "message"` | Commit changes |
| `git push` | Push to remote |
| `git pull` | Pull from remote |
| `git clone url` | Clone repository |
| `git branch` | List branches |

### Jupyter Commands

| Command | Description |
|---------|-------------|
| `jupyter notebook` | Launch Jupyter Notebook |
| `jupyter lab` | Launch JupyterLab |
| `Shift + Enter` | Run cell |
| `Ctrl + Enter` | Run cell without moving |
| `A` | Insert cell above |
| `B` | Insert cell below |
| `D + D` | Delete cell |
| `M` | Change to Markdown |

### Python Import Statements

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import tensorflow as tf
```