In [None]:
# In colab run this cell first to setup the file structure!
%cd /content
!rm -rf MOL518-Intro-to-Data-Analysis

!git clone https://github.com/shaevitz/MOL518-Intro-to-Data-Analysis.git
%cd MOL518-Intro-to-Data-Analysis/Lecture_1

# Lecture 1: Python, Jupyter, Colab, and the Programming Mindset

<p align="center">
<img src="media/ProgrammingLogos.png" alt="Logos" width="600" />
</p>

In this class, we will learn to code in Python using Jupyter Notebooks running in Google Colab. These notebooks let us combine code, explanatory text, figures, and results in a single, readable document, which makes them well suited for learning, exploration, and real data analysis.

The goals of this lecture are to
1.	Understand what Python, Jupyter Notebooks, and Google Colab are and how they fit together.
2.	Learn how to work in a notebook, including editing cells and using both code and Markdown text.
3.	Begin developing the habit of thinking like a computer so that you can design clear, explicit algorithms.
4.	Take first steps toward using AI tools productively and debugging when things go wrong.

**Note:** This course will be a very quick and dirty introduction to programming in Python. Our goal is to remove some of the fear and uncertainty when starting out so you can learn on your own afterwards to assist you in your research. Whether it is using modern machine learning tools or just scripting in ImageJ or Excel, the basic computing ideas we see here should carry over and allow you to jump in.

## Data Analysis is an Integral Part of the Scientific Process

<p align="center">
<img src="media/ScientificCycle.png" alt="Scientific Cycle" width="400" />
</p>

- Modern science is an iterative cycle, not a straight line from question to answer.
- Hypotheses motivate experimental design, which leads to data acquisition.
- **Data analysis sits at the same level as these steps**, shaping what we see, what we trust, and what we do next.
- Analysis transforms raw measurements into structure, patterns, and quantitative comparisons.
- The results of analysis inform interpretation, refine hypotheses, and often drive new experiments.
- Because analysis feeds back into every other stage, it is not a final step but a continuous partner in scientific reasoning.



## A note on using AI for programming

There is no doubt that generative AI tools (LLMS like ChatGPT, Codex, Claude, Gemini etc) have revolutionized coding. You will can/should/need to learn to use AI in your coding as you move forward in this course and beyond. AI-assisted coding is now common practice. In many real projects, people work “agentically” with AI, meaning you describe what you want to do, ask for help exploring or writing code, and then carefully read, test, and correct the results. The thinking and responsibility stay with you, but AI can speed up the trial and error and help you learn more efficiently. Those concepts are too advanced for this course, but LLMs are fantatic tools for helping you code, telling you how a function works, finding bugs, suggesting syntax etc. (See the syllabus for the explicit AI-use policy for this course.)

> You still need to know how to read and understand code in order to interpret what the AI did and if you think it is correct/appropriate. In that sense, we need to develop the skills to *read* Python. In this course, we will tell you when you are explicitely allowed to use AI to generate code for a task.


## What is Python?

- Python is a programming language we will use to analyze data, build simple models, and make quantitative plots.
- It is a way of giving a computer **precise, explicit instructions**. Computers are fast and reliable, but they do exactly what you tell them to do...
- Learning Python is therefore about learning how to **think step by step** and translate ideas into clear procedures.
- Python is widely used in the sciences and engineering because it is readable, flexible, and supported by powerful scientific libraries.
- In this course, we will use Python to load real biological data, perform calculations, and visualize results.
- No prior programming experience is expected. We will build concepts gradually and focus on understanding what each line of code does.

## Why We Use Notebooks in Scientific Computing

- Scientific work is rarely linear. We try an idea, look at the result, notice something unexpected, and change course.
- Analysis usually cycles through loading data, exploring it, visualizing patterns, revising assumptions, and repeating the process.
- A notebook keeps **code, text, equations, figures, and results together** in the order you actually thought about them.
- This makes your work easier to understand later, both for you and for others, because the reasoning sits next to the computation.
- Many researchers do most of their real analysis in notebooks, even when final results are later packaged into scripts or full-blown applications.

## Jupyter in Google Colab
Jupyter notebooks are not meant to be opened as plain text, they need an interface program to interpret, display, and run the code.

### Jupyter

- Jupyter (name refers to 3 languages: Julia, Python and R) is an underlying notebook system.
- It provides an interactive interface where code is run "cell" by "cell" and results appear immediately.
- Code, explanation, and figures live together in a single document.
- This makes Jupyter well suited for data exploration, debugging, and scientific communication.
- Jupyter can run locally on your laptop or on a remote server.
- Examples include shared servers, high performance clusters, and cloud systems.
- Notebooks often become living documents that evolve with a project and record both analysis and reasoning over time.

### Google Colab

- Google Colab is one way to run Jupyter notebooks.
- It is a hosted Jupyter environment accessed through a web browser, no local installation is required.
- Notebooks are saved automatically to your Google Drive.
- Sharing and collaboration are straightforward.

## Understanding the Parts of a Notebook
A notebook is a sequence of **cells**. There are two basic types:

1. **Markdown cells** for text, equations, images, and descriptions.
2. **Code cells** for executable Python code.

You run cells sequentially, as individuals or groups (or the whole notebook).

>Shift-Enter runs the current cell (either formats the Markdown or runs the code) and moves to the next cell.

#### Markdown Cell Example
<p align="center">
<img src="media/MarkdownExample.png" alt="Markdown Example" width="400" />
</p>

#### Code Cell Example

<p align="center">
<img src="media/CodeExample.png" alt="Code Example" width="500" />
</p>


### Markdown Cell Example (Double-click cell to "unformat")
Markdown lets us write formatted text:
- Bold and italics
- Headings
- Lists (like this one)

It can also display mathematics:
$I(t) = I_0 e^{-kt}$
and render images <img src="media/PUShield.png" alt="PU shield" width="50" />.

For a quick reference to Markdown syntax, see [Markdown Guide](https://www.markdownguide.org/basic-syntax/).

In [None]:
# This is a code cell. Run me with Shift+Enter.
print("You just executed some Python code inside a notebook!")

## Thinking Like a Computer
Programming requires a shift in perspective. Humans are used to filling in "gaps". (Pre-AI) Computers do not. **They follow each instruction literally.**

Consider the instruction: *"Find the average of these numbers."* A computer needs to perform the following exact steps:
1. Add the numbers.
2. Count how many there are.
3. Divide the total by the count.


## Algorithms as Recipes
A good cooking recipe tells you precisely what to do at each step. Algorithms are no different. Good programming means writing clear procedures for the computer.

Let's warm up with our first interactions.


In [None]:
# You can have Python print things for you with the print() function.
print("Hello from Python!")

## Comments Make Code Understandable
A **comment** in a code cell begins with `#` and all following text on that line is ignored by Python. 

Use them to explain *why* a section of code exists, clarify non-obvious choices, document assumptions, and leave useful notes for your later self and collaborators. Keep comments short and accurate.

- Inline comment:
```python
x = x + 1  # increment to account for zero-based indexing
```

- Block-style comments:
```python
# Loop over dataset to compute mean values.
# Uses numpy for numerically stable calculations.
for col in data.columns:
    ...
```

## Variables and Basic Operations
Printing a pre-determined sentence is ok, but variables will allow us to do much more powerful things.

Variables let us store various kinds of information for later use, manipulation, printing, plotting etc. They can be as simple as a single number, or a complex combination of numbers, text, and other kinds of data.

### Simple math
Python supports arithmetic directly:

Here is a concise reference to numeric operators and basic math functions in Python: https://docs.python.org/3/tutorial/introduction.html#numbers


In [None]:
a = 10
b = 4
print("a + b =", a + b) # adds a and b and prints the sum
print("a * b =", a * b) # multiplies a and b and prints the product
print("a / b =", a / b) # divides a by b and prints the ratio

Try modifying `a` and `b` above and re-running the cell.


## Numbers vs Strings
Strings are collections of characters, i.e. text. Strings act differently from numerical variables:

In [None]:
number = 5
text = "5"

print(number * 3)    # mathematical multiplication
print(text * 3)      # string "repetition"

### Exercise 1

Write code using variables to display the product of 17 and 3, followed by the word "Tigers" repeated 10 times.

In [None]:
# Your code goes here!





### Exercise 2
Compute the perimeter and area of a rectangle. Modify the code below to add a calculation of the perimeter of the rectangle.

In [None]:
length = 5
width = 2

# Area calculation
area = length * width
print("The area is", area)

#Perimeter calculation





## Errors and How to Read Them
Everyone makes mistakes (bugs), and when you code they happen often! It is typical for debugging to take the majority of your coding time, especially as you are first learning! Debugging can be very frustrating, but it is an essential skill. Indeed, as AI is becoming more prevelent for writing code, one could argue that debugging is even more important now than ever.

Let's look at an example:


In [None]:
radius = 2
pi = 3.14159

surface_area = 4 * Pi * radius**2

print("The sphere surface area is", surface_area)

Oop! That didn't run, and instead gave an error. Can you see what I did wrong?

The error message is informative, and tells me that I didn't define a variable called `Pi`. If you look carefully, I named it `pi`, without the capial P. **Variable names in Python are case sensitive!**


### What the Error Message Tells You
- The type of error
- The line where it occurred
- A descriptive message

Debugging is a core skill. Errors are not failures; they are feedback.


## Debugging with AI Tools

One way that AI can help is to interpret errors and suggest fixes. You can paste the error message (sometimes called a Traceback) into ChatGPT and ask what might be wrong. 

Let's try that with our Pi error code:


#### Query:
```
What does this error in my code mean?

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 4
      1 radius = 2
      2 pi = 3.14159
----> 4 surface_area = 4 * Pi * radius**2
      6 print("The surface area is", surface_area)

NameError: name 'Pi' is not defined
```

#### Response:

This error is Python being very literal about names.

You defined a variable called `pi` on line 2, all lowercase:

```python
pi = 3.14159

But on line 4 you used Pi with a capital P:

surface_area = 4 * Pi * radius**2

In Python, capitalization matters. pi and Pi are treated as completely different names. Since Pi was never defined anywhere, Python raises a NameError and tells you exactly that: it does not know what Pi refers to.

How to fix it

Use the same name consistently:

surface_area = 4 * pi * radius**2

Big picture

Python does not infer intent or meaning. It only checks whether a name has been defined earlier in the notebook or script, with exactly the same spelling and capitalization.

**Now go back and fix the surface area calcuation and confirm it runs.**

### Exercise 3
Here is a code cell that calculate the total number of PhD students in LSI based on the enrollment of the QCB (42) and BPY (17) programs. Does it run? If not, fix it!

In [None]:
total_students = qcb + bpy
print("Total number of students in both classes is: ", total_students)

### Exercise 4

Why doesn't this cell run?

In [None]:
number = 5
text = "5"
print(text + number)