#  Section 1: Getting Started with Python Environments

To do data science in Python, you need a **coding environment**.  
There are a few options, and you should be aware of them before we start.


In [2]:
# Run this cell to check Python version
!python --version

Python 3.11.8


## Option 1: Google Colab

- Runs in your browser, nothing to install.  
- Already comes with **most packages pre-installed** (e.g., pandas, numpy, matplotlib).  
- Uses a **virtual environment** behind the scenes, so dependencies are managed for you.  

 If you’re new, we recommend starting here.


## Option 2: Local IDEs

If you want more control, you can install Python locally and use an IDE.  
Popular options include:  
- **VS Code** → Lightweight, customizable, very popular.  
- **Cursor** → AI-powered IDE (free for students: [Cursor for Students](https://cursor.com/en/students)).  
- **PyCharm** → Professional IDE, excellent for large projects.  

 In a local environment, **you must install your own packages**.


##  Virtual Environments with Conda

When you install Python packages globally, different projects can **conflict** with each other:  
- One project might need `pandas==1.5`, another might need `pandas==2.0`.  
- Updating a library for one project could **break** another.  

 That’s why we use **virtual environments**: they isolate each project, keeping dependencies separate and reproducible.

---

###  Install Conda
If you don’t already have it, install **Miniconda** (recommended) or **Anaconda**:  
- **Download**: [Download Miniconda](https://www.anaconda.com/download/success)

Follow the installer instructions for your system.

---

### Create Virtual Environments
Each project should get its own environment.

Example:  
```bash
# Create an environment for project A
conda create -n projectA python=3.11

# Create another environment for project B
conda create -n projectB python=3.9
```
**if on windows do the following in the conda prompt**

### Activate / Swap Between Environments

Windows:
```
conda activate projectA
conda activate projectB
```
Mac:
```
source activate projectA
source activate projectB
```
To go back to the default environment:
```
conda deactivate
```

### Install Packages in an Environment

Once inside the environment, install dependencies:

```
conda install pandas numpy matplotlib
```

### Check Which Environment You’re In

```
conda info --envs
```

### Delete an Environment
```
conda env remove -n projectA
```

#  AI in This Course and Your Career

Colab includes **Gemini AI** directly in the interface. The professor encourages you to use it  but please **use it wisely**.

---

## Lessons from Vibe Coding
- You **need** to have **good taste**: judgment matters more than ever.  
- It’s easy to **lose track** once a project gets beyond 10k lines. You will lose track much quicker if you are not using it properly.
- **AI is only as good as your own programming skill** — fundamentals come first.  

---

##  How This Affects Professional Careers
- **Senior programmers** get a **boost**, because AI takes over much of the “junior-level” coding.  
- **Junior programmers** risk losing the chance to practice and build up **experience and taste**.  

 That means *your time as a student is the best (and maybe only) chance* to develop these skills.  
Don’t rob yourself of the opportunity by leaning on AI too early.  

---

##  How We’ll Use AI in This Course
We **embrace AI**, but in structured levels:

- **Level 1: AI as a Coding Assistant**  
  - Syntax help (“How do I rename a column in Pandas?”)  
  - Debugging help (“Why am I getting a KeyError?”)  

- **Level 2: AI as a Data Processor (via APIs)**  
  - Extract named entities (people, places, etc.) from Tweets.  
  - Augment a dataset with external information (e.g., add NYT headlines).  

- **Level 3: AI as an Analysis Pipeline**  
  - Brainstorming, planning, and auditing your workflow.  

---

##  Cautions
- Don’t ask AI to **do the assignment for you**.  
- Always **try first** then use AI if stuck.  
- Test AI’s code: sometimes it’s outdated or slightly wrong.
- Data Science is iterative, and each code section relies on the one before it, so AI code can and will fail you if you are using it blindly and dont understand it.
- Remember: **copying code ≠ understanding code**.  

---

##  Best Practice - while coding
1. **Attempt first**  even if it feels slower.  
2. **Use AI second** as a helper, not a crutch.
3. **Verify** run and check AI’s output in your notebook.  
4. **Reflect** ask yourself: *Do I understand why this works?*  

---

 **Key Takeaway**:  
AI is an **assistant** or a **coach**, not a **replacement**.  
Your junior coding years are where you build your foundations, don’t skip them.


# Section 2: Refresh on Python Basics

**Objectives:**
- Recall variables, lists, loops, and conditionals.
- Practice basic Python commands inside Jupyter.


In [None]:
# Variables and types
a = 10
b = 3.14
c = "hello"
d = True

print(type(a), type(b), type(c), type(d))


<class 'int'> <class 'float'> <class 'str'> <class 'bool'>


In [3]:
# Basic operations
x = 5
print("x squared:", x ** 2)
print("x divided by 2:", x / 2)
print("floor division:", x // 2)
print("modulo:", x % 2)


x squared: 25
x divided by 2: 2.5
floor division: 2
modulo: 1


In [4]:
# Lists and indexing
numbers = [10, 20, 30, 40, 50]
print("First element:", numbers[0])
print("Last element:", numbers[-1])
print("Slice [1:3]:", numbers[1:3])

First element: 10
Last element: 50
Slice [1:3]: [20, 30]


In [5]:
# Loops and conditionals
for num in numbers:
    if num > 25:
        print(num, "is greater than 25")
    else:
        print(num, "is not greater than 25")

10 is not greater than 25
20 is not greater than 25
30 is greater than 25
40 is greater than 25
50 is greater than 25


###  Exercises
1. Create a variable that stores your name. Print "Hello, [your name]!"  
2. Make a list of three numbers. Print the sum and average.  
3. Write a loop that prints whether each number is even or odd.  
4. Bonus: Write a function `square_list(lst)` that returns a list of squares.

In [None]:
# Question 1. 

name = "Myles"

print(f"Hello, {name}!")

# Question 2. 

number_list = [3, 5, 8]

added_together = sum(number_list)

averaged = added_together / len(number_list)

print(f"Sum of number list: {added_together}")
print(f"Average of numbers: {averaged}")

# Question 3.

for number in number_list:
    if number % 2 == 0:
        print("The number is even")
    else:
        print("The number is odd")

# Question 4. 

def square_list(lst):
    result = []
    for number in lst:
        result.append(number ** 2)
    return result

list_of_numbers = [1, 2, 3, 4, 5]
list_of_numbers = square_list(list_of_numbers)
print(f"List of squares 1-5: {list_of_numbers}")
 

Hello, Myles!
Sum of number list: 16
Average of numbers: 5.333333333333333
The number is odd
The number is odd
The number is even
List of squares 1-5: [1, 4, 9, 16, 25]


# Section 3: Working with Pandas

**Objectives:**
- Load datasets into Pandas DataFrames.
- Save and reload data using CSVs.
- Understand DataFrame vs Series.

 Note: This dataset is **clean**. Real-world data is usually messy (missing values, typos, encodings). *Cleaning Data is 70 to 80 percent of the job*


In [15]:
import pandas as pd
from sklearn.datasets import load_iris

# Load dataset
iris_data = load_iris(as_frame=True)
iris_df = iris_data.frame

iris_df.head()


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [16]:
# Save dataset to CSV
iris_df.to_csv("iris.csv", index=False)

# Load it back
iris = pd.read_csv("iris.csv")

iris.head()


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


###  Exercises
1. Save the first 20 rows of `iris` to `iris_sample.csv`.  
2. Load `iris_sample.csv` into `iris_small`.  
3. Confirm `iris_small.shape` is `(20, 5)`.  
4. Check if `iris_small["petal width (cm)"]` is a Series or DataFrame.


In [None]:
# Question 1.

iris_df.head(20).to_csv("iris_sample.csv", index=False)

# Question 2. 

iris_small = pd.read_csv("iris_sample.csv")

# Question 3. Yes

print(f"{iris_small.shape}")

# Question 4. Series

type(iris_small["petal width (cm)"])

(20, 5)


pandas.core.series.Series

# Section 4: Exploring DataFrames

**Objectives:**
- Preview and inspect DataFrames.
- Learn what rows and columns represent.
- Explore dataset dimensions.


In [77]:
# Preview
iris.head()
iris.tail()
iris.head(10)

# Not Pandas but also very cool and useful
# from google.colab import sheets
# sheet = sheets.InteractiveSheet(df=iris)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
5,5.4,3.9,1.7,0.4,0
6,4.6,3.4,1.4,0.3,0
7,5.0,3.4,1.5,0.2,0
8,4.4,2.9,1.4,0.2,0
9,4.9,3.1,1.5,0.1,0


In [32]:
# Columns and data types
print("Column names:", iris.columns)
print("Data types:")
print(iris.dtypes)

Column names: Index(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
       'petal width (cm)', 'target'],
      dtype='object')
Data types:
sepal length (cm)    float64
sepal width (cm)     float64
petal length (cm)    float64
petal width (cm)     float64
target                 int64
dtype: object


In [33]:
# Shape of the dataset
print("Shape:", iris.shape)
print("Number of rows:", len(iris))
print("Number of columns:", len(iris.columns))

Shape: (150, 5)
Number of rows: 150
Number of columns: 5


In [34]:
# Inspect a single row
iris.iloc[0]


sepal length (cm)    5.1
sepal width (cm)     3.5
petal length (cm)    1.4
petal width (cm)     0.2
target               0.0
Name: 0, dtype: float64

###  Exercises
1. Use `.head(15)` to preview the first 15 rows.  
2. Print all column names.  
3. Confirm dataset has 150 rows with `.shape`.  
4. Select row 25 with `.iloc`. What does it represent?


In [76]:
# Question 1. 

print(iris.head(15))

# Question 2. 

print("Column names:", iris.columns)

# Question 3.

print(iris.shape)
print("Rows:", len(iris))

# Question 4. 

print("Row 25:\n", iris.iloc[25])

    sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                 5.1               3.5                1.4               0.2   
1                 4.9               3.0                1.4               0.2   
2                 4.7               3.2                1.3               0.2   
3                 4.6               3.1                1.5               0.2   
4                 5.0               3.6                1.4               0.2   
5                 5.4               3.9                1.7               0.4   
6                 4.6               3.4                1.4               0.3   
7                 5.0               3.4                1.5               0.2   
8                 4.4               2.9                1.4               0.2   
9                 4.9               3.1                1.5               0.1   
10                5.4               3.7                1.5               0.2   
11                4.8               3.4 

4. Row 25 represents a single observation with all the columns of said observation being observed for this point in the dataset. Due to the zero-based indexing (start coutning from zero) rules of python as a programming language, what we are actually seeing is the 26th observation in the data.

# Section 5: Selecting and Filtering Data

**Objectives:**
- Select rows with `.iloc`.
- Select columns by name.
- Use boolean filtering.


In [39]:
# Selecting rows by range
iris.iloc[0:5]
iris.iloc[10:20]

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
10,5.4,3.7,1.5,0.2,0
11,4.8,3.4,1.6,0.2,0
12,4.8,3.0,1.4,0.1,0
13,4.3,3.0,1.1,0.1,0
14,5.8,4.0,1.2,0.2,0
15,5.7,4.4,1.5,0.4,0
16,5.4,3.9,1.3,0.4,0
17,5.1,3.5,1.4,0.3,0
18,5.7,3.8,1.7,0.3,0
19,5.1,3.8,1.5,0.3,0


In [36]:
# Selecting columns
iris["sepal length (cm)"]        # Single column (Series)
iris[["sepal length (cm)", "petal length (cm)"]]   # Multiple columns (DataFrame)

Unnamed: 0,sepal length (cm),petal length (cm)
0,5.1,1.4
1,4.9,1.4
2,4.7,1.3
3,4.6,1.5
4,5.0,1.4
...,...,...
145,6.7,5.2
146,6.3,5.0
147,6.5,5.2
148,6.2,5.4


In [37]:
# Boolean selection
iris[iris["petal length (cm)"] > 1.5]

iris[(iris["sepal length (cm)"] > 5.0) &
     (iris["petal length (cm)"] < 2.0)]


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
5,5.4,3.9,1.7,0.4,0
10,5.4,3.7,1.5,0.2,0
14,5.8,4.0,1.2,0.2,0
15,5.7,4.4,1.5,0.4,0
16,5.4,3.9,1.3,0.4,0
17,5.1,3.5,1.4,0.3,0
18,5.7,3.8,1.7,0.3,0
19,5.1,3.8,1.5,0.3,0
20,5.4,3.4,1.7,0.2,0


### Exercises
1. Select the first 10 rows of `iris`.  
2. Select only `"sepal width (cm)"` and `"petal width (cm)"`.  
3. Create a DataFrame where `"sepal length (cm)" > 6.0`.  
4. Bonus: Select rows where `"petal length (cm)" > 1.5` AND `"sepal width (cm)" < 3.0`.


In [65]:
# Question 1.
ten_rows = iris.iloc[0:10]

print(f"First ten rows of iris:\n {ten_rows}")

# Question 2.

width = iris[["sepal width (cm)", "petal width (cm)"]]

print(f"Sepal width (cm) and petal width (cm):\n {width}")

# Question 3.

length_data_frame = iris[iris["sepal length (cm)"] > 6.0]

print(f"Sepal length > 6.0cm:\n {length_data_frame}")

# Question 4. 

length_width = iris[(iris["petal length (cm)"] > 1.5) & (iris["sepal width (cm)"] < 3.0)]

print(f"Petal length > 1.5cm and sepal width < 3.0cm:\n {length_width}")


First ten rows of iris:
    sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   
5                5.4               3.9                1.7               0.4   
6                4.6               3.4                1.4               0.3   
7                5.0               3.4                1.5               0.2   
8                4.4               2.9                1.4               0.2   
9                4.9               3.1                1.5               0.1   

   target  
0       0  
1       0  
2       0  
3       0  
4       0  
5       0  
6       0  
7       0

# Section 6: Reflection & Next Steps

**Objectives:**
- Reflect on today’s practice.
- Prepare for next lecture (Filtering & Transformation).

### Reflection Questions
1. What did you learn about DataFrames today?  
2. How did you use AI as a tool (if at all)?  
3. Where did you get stuck? How did you solve it?  

### Next Lecture Preview
- Filtering data with conditions.
- Transforming columns with `.apply()`.
- Grouping and aggregating data with `.groupby()`.


Section 6 Reflection Questions:

1. I learned that data frames are two-dimensional structures of data with row indices and column labels. In holding different data types and formats, they can be very versatile upon further examinations by the user of said data. When placed into sheet formats (Excel, Googlesheets, etc.), they are like tables where rows makeup observations and columns make up each of the variables to define these observations. By referencing different columns and rows throughout the data frame, one can effectively handle datasets that hold alot of nuanced details within them. If there is ever a need to filter, merge/group, clean or aggregate certain aspects of the data, combinations of different series in the data frame make it possible to do so. From this, we get data analytics that allow us to see actionable insights for the future (making conclusions based off this data that were not particuarly obvious beforehand).

2. I didn't use AI aas a tool for this particular assignment. However, in the future I see myself using it in order to understand the full capabilities of some built-in functions and libraries that I may encounter during my work. This way, even when I get stuck and feel as though I have to rely on AI, I am still engaging with the learning process as much as possible (Building a foundation).

3. I got stuck on the last exercise which required me to filter and select rows where petal length was greater than 1.5cm, and sepal width was less than 3.0cm. I didn't understand the syntax rules of data frames that well yet, as it could be quite confusing figuring out how to nest one within another when setting conditional rules while including coordinating conjunction symbols. However, upon revisting the notes given to us and seeing how the syntax rules changed up depending on the number of arguments within the nest of data frames, I was able to fix this and manipulate the output properly. 
