# Notebook 1: Introduction to Google Colab and Python Basics

## SECTION 1: GETTING STARTED WITH GOOGLE COLAB

Welcome to your first Python for Data Analysis class! This notebook will introduce you to Google Colab and Python basics, with a special focus on helping SQL users transition to Python.

In [None]:
# This is a code cell. You can run Python code here by clicking the play button
# on the left or pressing Shift+Enter.
print("Welcome to Google Colab!")

# EXERCISE: Run this cell to verify your environment is working properly.

Welcome to Google Colab!


In [None]:
# Google Colab provides many built-in Python libraries
# Let's check our Python version
import sys
print(f"Python version: {sys.version}")

Python version: 3.13.2 (tags/v3.13.2:4f8bb39, Feb  4 2025, 15:23:48) [MSC v.1942 64 bit (AMD64)]


In [None]:
import sys
print(f"Python version: {sys.version}")

Python version: 3.13.2 (tags/v3.13.2:4f8bb39, Feb  4 2025, 15:23:48) [MSC v.1942 64 bit (AMD64)]


## SECTION 2: PYTHON VARIABLES AND DATA TYPES

In Python, you can create variables without declaring their type. This is different from SQL where column types must be defined in tables.

In [None]:
# Numeric types
integer_value = 42
float_value = 3.14159

# Text type
text_string = "Hello, Data Analyst!"
another_string = 'Single quotes work too'

# Boolean type
is_active = True
has_data = False

# The print() function outputs values to the console
print(integer_value)
print(float_value)
print(text_string)
print(is_active)

42
3.14159
Hello, Data Analyst!
True


In [None]:
# You can check variable types with the type() function
print(type(integer_value))  # <class 'int'>
print(type(float_value))    # <class 'float'>
print(type(text_string))    # <class 'str'>
print(type(is_active))      # <class 'bool'>

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>


### EXERCISE: Create your own variables with different data types and print their values and types.

Try creating variables to represent data you might find in an e-commerce dataset (like those in the Olist database).

In [None]:
# Your code here:



## SECTION 3: BASIC OPERATIONS

In [None]:
# Arithmetic operations
addition = 10 + 5 # variable declaration or assignment
subtraction = 10 - 5
multiplication = 10 * 5
division = 10 / 5  # Returns float
integer_division = 10 // 3  # Returns integer part only
remainder = 10 % 3  # Returns remainder
exponentiation = 10 ** 2  # 10 to the power of 2

print(f"Addition: {addition}") # calling a variable or variable calling
print(f"Subtraction: {subtraction}")
print(f"Multiplication: {multiplication}")
print(f"Division: {division}")
print(f"Integer Division: {integer_division}")
print(f"Remainder: {remainder}")
print(f"Exponentiation: {exponentiation}")

Addition: 15
Subtraction: 5
Multiplication: 50
Division: 2.0
Integer Division: 3
Remainder: 1
Exponentiation: 100


In [None]:
# String operations
first_name = "Data"
last_name = "Analyst"

# Concatenation
full_name = first_name + " " + last_name
print(full_name)


Data Analyst


In [None]:

# String methods
print(full_name.upper())  # Convert to uppercase
print(full_name.lower())  # Convert to lowercase
print(len(full_name))     # String length
print(full_name.replace("Data", "Python"))  # Replace substring


DATA ANALYST
data analyst
12
Python Analyst


In [None]:

# String formatting (f-strings, introduced in Python 3.6)
age = 25
experience = 3.5
intro = f"I am a {full_name}, {age} years old with {experience} years of experience."
print(intro)

I am a Data Analyst, 25 years old with 3.5 years of experience.


### EXERCISE: Create and manipulate your own variables to practice these operations.

Try performing calculations you might do with an e-commerce dataset, like calculating total order values or formatting product information.

In [None]:
# Your code here:



## SECTION 4: DATA STRUCTURES

### LISTS
Similar to arrays in other languages, can store mixed data types

In [None]:
# Creating a list
products = ["Laptop", "Headphones", "Mouse", "Keyboard"]
prices = [1200, 150, 25, 60]
mixed_list = ["Product", 42, 3.14, True, None]

print(products)
print(prices)
print(mixed_list)


['Laptop', 'Headphones', 'Mouse', 'Keyboard']
[1200, 150, 25, 60]
['Product', 42, 3.14, True, None]


In [None]:

# Accessing list elements (0-indexed)
print(f"First product: {products[0]}")  # Laptop
print(f"Last product: {products[-1]}")  # Keyboard


First product: Laptop
Last product: Keyboard


In [None]:

# Slicing lists [start:end:step]
print(f"First two products: {products[0:2]}")  # ['Laptop', 'Headphones']
print(f"Every other price: {prices[::2]}")     # [1200, 25]

First two products: ['Laptop', 'Headphones']
Every other price: [1200, 25]


In [None]:
products

['Laptop', 'Headphones', 'Mouse', 'Keyboard']

In [None]:
# List methods
products.append("Monitor")  # Add to end
print(products)


['Laptop', 'Headphones', 'Mouse', 'Keyboard', 'Monitor']


In [None]:

products.insert(1, "Tablet")  # Insert at specific position
print(products)


['Laptop', 'Tablet', 'Headphones', 'Mouse', 'Keyboard', 'Monitor']


In [None]:

products.remove("Mouse")  # Remove specific value
print(products)


['Laptop', 'Tablet', 'Headphones', 'Keyboard', 'Monitor']


In [None]:

popped_item = products.pop()  # Remove and return last item
print(f"Removed: {popped_item}, Remaining: {products}")


Removed: Monitor, Remaining: ['Laptop', 'Tablet', 'Headphones', 'Keyboard']


In [None]:

# List operations
total_items = len(products)
print(f"Total products: {total_items}")


Total products: 4


In [None]:
products

['Laptop', 'Tablet', 'Headphones', 'Keyboard']

In [None]:
len('girl')

4

In [None]:
prices

[1200, 150, 25, 60]

In [None]:

sorted_prices = sorted(prices, reverse=True)  # Sort in descending order
print(f"Sorted prices: {sorted_prices}")

Sorted prices: [1200, 150, 60, 25]


### DICTIONARIES
Key-value pairs (similar to JSON)

In SQL, this might be like a row in a table with column names as keys

In [None]:
product = {
    "name": "Laptop",
    "brand": "TechBrand",
    "price": 1200,
    "in_stock": True,
    "specs": {
        "cpu": "Intel i7",
        "ram": "16GB",
        "storage": "512GB SSD"
        # add more specifications here
    }
}

print(product)


{'name': 'Laptop', 'brand': 'TechBrand', 'price': 1200, 'in_stock': True, 'specs': {'cpu': 'Intel i7', 'ram': '16GB', 'storage': '512GB SSD'}}


In [None]:

# Accessing dictionary values
print(f"Product name: {product['name']}")
print(f"CPU: {product['specs']['ram']}")


Product name: Laptop
CPU: 16GB


In [None]:

# Dictionary methods
print(f"Keys: {product.keys()}")
print(f"Values: {product.values()}")
print(f"Items: {product.items()}")


Keys: dict_keys(['name', 'brand', 'price', 'in_stock', 'specs'])
Values: dict_values(['Laptop', 'TechBrand', 1200, True, {'cpu': 'Intel i7', 'ram': '16GB', 'storage': '512GB SSD'}])
Items: dict_items([('name', 'Laptop'), ('brand', 'TechBrand'), ('price', 1200), ('in_stock', True), ('specs', {'cpu': 'Intel i7', 'ram': '16GB', 'storage': '512GB SSD'})])


In [None]:

# Adding/updating values
product["color"] = "Silver"
product["price"] = 1100  # Update existing key
print(product)


{'name': 'Laptop', 'brand': 'TechBrand', 'price': 1100, 'in_stock': True, 'specs': {'cpu': 'Intel i7', 'ram': '16GB', 'storage': '512GB SSD'}, 'color': 'Silver'}


In [None]:

# Safe access with get() (returns None or default if key doesn't exist)
print(product.get("price", "Not specified"))

1100


### EXERCISE: Create your own lists and dictionaries and practice accessing and modifying them.

Try creating data structures that might represent e-commerce orders, customers, or products.

In [None]:
# Your code here:



In [None]:
name = "Prince"
print("My name is ", name) # calling a variable or variable calling
print('My name is ', name , 'and I am a ', name)
print()
print(f"My 'name' is {name} and I am a {name}") # f-string format
# He said "Hello ,here ai I but 'I am not showing up!'"

My name is  Prince
My name is  Prince and I am a  Prince

My 'name' is Prince and I am a Prince


## SECTION 5: CONTROL FLOW

### IF STATEMENTS

if this happens then do this
if the price of the pen is N50, then buy otherwise bring back my money

condition = if the price of the pen is N50


action = buy


counter action = do nothing

In [None]:
price = 10

if price > 100: # condition
    print("High price") # consequence
elif price > 50:
    print("Medium price")
else:
    print("Low price")


Low price


In [None]:
True and True # True
True or False # True
False or True # True

True

In [None]:

# Multiple conditions
in_stock = False
on_sale = False

if in_stock and on_sale:
    print("Available and on sale!")
elif in_stock or on_sale:
    print("Either available or on sale")
else:
    print("Neither available nor on sale")

Neither available nor on sale


### LOOPS

In [None]:
products

['Laptop', 'Tablet', 'Headphones', 'Keyboard']

In [None]:
# intention: add 2 to numbers greater than 6 but less than 15
# print(f"2 + 7 = {2 + 7}")
# print(f"2 + 8 = {2 + 8}")
# print(f"2 + 9 = {2 + 9}")
# print(f"2 + 10 = {2 + 10}")
# print(f"2 + 11 = {2 + 11}")
# print(f"2 + 12 = {2 + 12}")
# print(f"2 + 13 = {2 + 13}")
# print(f"2 + 14 = {2 + 14}")

# numbers_gt_6 = [7, 8, 9, 10, 11, 12, 13, 14]
for number in range(7,15):
    result = 2 + number
    print(f"2 + {number} = {result}")


[i for i in range(7,15)] # 0 to 9
# List comprehension
# [expression for item in iterable if condition]

2 + 7 = 9
2 + 8 = 10
2 + 9 = 11
2 + 10 = 12
2 + 11 = 13
2 + 12 = 14
2 + 13 = 15
2 + 14 = 16


[7, 8, 9, 10, 11, 12, 13, 14]

In [None]:
# For loop with list
print("Products:")
for product in products:
    print(f"- {product}")


Products:
- Laptop
- Tablet
- Headphones
- Keyboard


In [None]:

# For loop with range
print("Numbers from 0 to 4:")
for i in range(5):  # 0, 1, 2, 3, 4
    print(i)


Numbers from 0 to 4:
0
1
2
3
4


In [None]:

# For loop with range (start, stop, step)
print("Even numbers from 2 to 10:")
for i in range(2, 11, 2):  # 2, 4, 6, 8, 10
    print(i)


Even numbers from 2 to 10:
2
4
6
8
10


In [None]:
product = {
    "name": "Laptop",
    "brand": "TechBrand",
    "price": 1200,
    "in_stock": True,
    "specs": {
        "cpu": "Intel i7",
        "ram": "16GB",
        "storage": "512GB SSD"
        # add more specifications here
    }
}

In [None]:
type(product) == dict # True
isinstance(product, dict)

True

In [None]:

# For loop with dictionary
print("Product details:")
for key, value in product.items():
    # Skip nested dictionaries for cleaner output
    if isinstance(value, dict):
        continue
    print(f"{key}: {value}")

Product details:
name: Laptop
brand: TechBrand
price: 1200
in_stock: True


You are given a dictionary where the keys are days of the week and the values are daily sales figures.

Write a Python program that goes through the dictionary and prints:

The day and `"High"` if the value is greater than 1000

The day and `"Medium"` if the value is between 500 and 1000 (inclusive)

The day and `"Low"` if the value is less than 500

```
sales = {
    "Monday": 1200,
    "Tuesday": 750,
    "Wednesday": 300,
    "Thursday": 1020,
    "Friday": 499
}
```

In [None]:
sales_dict = {
    "Monday": 1200,
    "Tuesday": 750,
    "Wednesday": 300,
    "Thursday": 1020,
    "Friday": 499
}


In [None]:
sales_dict.values() # returns all values in the dictionary

dict_values([1200, 750, 300, 1020, 499])

In [None]:
for key, value in sales_dict.items():
    if value > 1000:
        print(f"{key}: High")
    elif 500 < value <= 1000:
        print(f"{key}: Medium")
    else:
        print(f"{key}: Low")

Monday: High
Tuesday: Medium
Wednesday: Low
Thursday: High
Friday: Low


In [None]:
for value in sales_dict.values():
    if value > 1000:
        print("High")
    elif 500 < value <= 1000:
        print("Medium")
    else:
        print("Low")

High
Medium
Low
High
Low


In [None]:
# While loop
count = 5
print("Countdown:")
while count > 0: # 5 > 0
    print(count)
    count -= 1 # is the same as  count = count - 1
print("Go!")

# While loop
# goal: count from 1 to 5
count = 1 # initialize the variable
print("Count up:")
while count <= 5: # 1 < 5
    print(count)
    count += 1 # is the same as  count = count + 1 : incremeting your count to be sent back into your while condition all over again
print("Done!")


Countdown:
5
4
3
2
1
Go!
Count up:
1
2
3
4
5
Done!


In [None]:
5%2 # 2,1/2
# dividend ÷ divisor = quotient ... remainder
# 5 ÷ 2 = 2 ... 1
5%2 # 1 = remainder --> modulus operator
5/2 # 2.5 = float
5//2 # 2 = quotient --> integer division operator

2

In [None]:

# Loop control
print("Numbers divisible by 3 (from 1 to 10):")
for i in range(1, 11):
    if i % 3 != 0:
        continue  # Skip to next iteration
    print(i)


Numbers divisible by 3 (from 1 to 10):
3
6
9


In [None]:

print("Loop until finding 'Mouse':")
for item in ["Laptop", "Keyboard", "Mouse", "Monitor"]:
    print(item)
    if item == "Mouse":
        print("Found Mouse!")
        break  # Exit loop

Loop until finding 'Mouse':
Laptop
Keyboard
Mouse
Found Mouse!


### EXERCISE: Write your own control flow examples with if statements and loops.

Try creating examples that handle situations from an e-commerce context, like processing orders with different statuses or calculating shipping based on customer location.

In [None]:
# Your code here:



## SECTION 6: FUNCTIONS

In [None]:
# Defining a simple function
def greet(name):
    return f"Hello, {name}!"


In [None]:
print(greet(name="Data Analyst"))  # Call the function with an argument

Hello, Data Analyst!


In [None]:

# Calling the function
message = greet("Data Analyst")
print(message)

Hello, Data Analyst!


In [None]:
# Function with multiple parameters
def calculate_total(price, quantity, tax_rate=0.1):
    subtotal = price * quantity
    tax = subtotal * tax_rate
    return subtotal + tax



In [None]:

# Calling with positional arguments
total1 = calculate_total(19.99, 3)
print(f"Total 1: NGN{total1:.2f}")


Total 1: NGN68.97


In [None]:

# Calling with keyword arguments
total2 = calculate_total(quantity=2, tax_rate=0.05, price=24.99)
print(f"Total 2: ${total2:.2f}")

Total 2: $52.48


In [None]:
products

['Laptop', 'Tablet', 'Headphones', 'Keyboard']

In [None]:
# Function with multiple return values
def get_product_stats(products):
    count = len(products)
    if count == 0:
        return 0, None, None

    min_product = min(products, key=len)
    max_product = max(products, key=len)
    return [count, min_product, max_product]


In [None]:
output = get_product_stats(["Laptop", "Mouse", "Keyboard", "Monitor"])

In [None]:
count=output[0]
shortest =output[1]
longest = output[2]
# list unpacking
count, shortest, longest = output

In [None]:

# Unpacking multiple return values
count, shortest, longest = get_product_stats(["Laptop", "Mouse", "Keyboard", "Monitor"])
print(f"Products: {count}, Shortest: {shortest}, Longest: {longest}")

Products: 4, Shortest: Mouse, Longest: Keyboard


### EXERCISE: Write your own functions to solve problems related to data processing.

Try creating functions that would be useful in an e-commerce context, like calculating shipping costs, formatting product details, or filtering products by category.

In [None]:
# Your code here:



## SECTION 7: SQL TO PYTHON TRANSLATION

In SQL, you use queries to retrieve and manipulate data. In Python, we often use pandas (imported as pd) for similar operations.

In [None]:
import pandas as pd

# Create a DataFrame (similar to a SQL table)
data = {
    'product_id': [1, 2, 3, 4, 5],
    'name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones'],
    'category': ['Computer', 'Accessory', 'Accessory', 'Computer', 'Audio'],
    'price': [1200, 25, 60, 300, 150],
    'in_stock': [True, True, False, True, True]
}

products_df = pd.DataFrame(data)
print("Products DataFrame:")
print(products_df)

Products DataFrame:
   product_id        name   category  price  in_stock
0           1      Laptop   Computer   1200      True
1           2       Mouse  Accessory     25      True
2           3    Keyboard  Accessory     60     False
3           4     Monitor   Computer    300      True
4           5  Headphones      Audio    150      True


In [None]:
# SQL: SELECT * FROM products
# Python:
print("\nAll products (SELECT *):\n")
print(products_df)



All products (SELECT *):

   product_id        name   category  price  in_stock
0           1      Laptop   Computer   1200      True
1           2       Mouse  Accessory     25      True
2           3    Keyboard  Accessory     60     False
3           4     Monitor   Computer    300      True
4           5  Headphones      Audio    150      True


In [None]:

# SQL: SELECT name, price FROM products
# Python:
print("\nSelect specific columns:\n")
print(products_df[['name', 'price']])



Select specific columns:

         name  price
0      Laptop   1200
1       Mouse     25
2    Keyboard     60
3     Monitor    300
4  Headphones    150


In [None]:

# SQL: SELECT * FROM products WHERE category = 'Accessory'
# Python:
print("\nFiltering (WHERE):\n")
accessories = products_df[products_df['category'] == 'Accessory']
print(accessories)


Filtering (WHERE):

   product_id      name   category  price  in_stock
1           2     Mouse  Accessory     25      True
2           3  Keyboard  Accessory     60     False


In [None]:
# SQL: SELECT category, AVG(price) FROM products GROUP BY category
# Python:
print("\nGrouping and aggregation (GROUP BY):\n")
category_avg = products_df.groupby('category')['price'].mean().reset_index()
print(category_avg)

# SQL: SELECT * FROM products ORDER BY price DESC
# Python:
print("\nSorting (ORDER BY):\n")
sorted_products = products_df.sort_values('price', ascending=False)
print(sorted_products)

# SQL: SELECT category, COUNT(*) FROM products GROUP BY category HAVING COUNT(*) > 1
# Python:
print("\nGROUP BY with HAVING:\n")
category_counts = products_df.groupby('category').size().reset_index(name='count')
print(category_counts[category_counts['count'] > 1])

### EXERCISE: Try writing your own pandas operations to mimic SQL queries.

Practice translating SQL queries you might use in your work to their pandas equivalents.

In [None]:
# Your code here:



## SECTION 8: PUTTING IT ALL TOGETHER

Let's create a function that performs a complete data analysis workflow.

In [None]:
def analyze_product_data(data):
    # Create DataFrame
    df = pd.DataFrame(data)

    # Calculate basic statistics
    total_products = len(df)
    avg_price = df['price'].mean()
    in_stock_count = df['in_stock'].sum()

    # Group by category
    category_stats = df.groupby('category').agg({
        'product_id': 'count',
        'price': ['mean', 'min', 'max'],
        'in_stock': 'sum'
    })

    # Format and return results
    print(f"Total Products: {total_products}")
    print(f"Average Price: ${avg_price:.2f}")
    print(f"In-Stock Products: {in_stock_count} ({in_stock_count/total_products:.1%})")
    print("\nCategory Statistics:")
    print(category_stats)

    # Return the most expensive product
    most_expensive = df.loc[df['price'].idxmax()]
    return most_expensive

# Run the analysis
print("\nProduct Analysis Results:")
top_product = analyze_product_data(data)
print("\nMost Expensive Product:")
print(top_product)


Product Analysis Results:
Total Products: 5
Average Price: $347.00
In-Stock Products: 4 (80.0%)

Category Statistics:
          product_id  price            in_stock
               count   mean  min   max      sum
category                                       
Accessory          2   42.5   25    60        1
Audio              1  150.0  150   150        1
Computer           2  750.0  300  1200        2

Most Expensive Product:
product_id           1
name            Laptop
category      Computer
price             1200
in_stock          True
Name: 0, dtype: object


### FINAL EXERCISE: Expand the analyze_product_data function to include more analyses.

For example, add calculations for median price, price ranges, or out-of-stock percentages.

In [None]:
# Your code here:



## NEXT STEPS

Congratulations on completing this introduction to Python and Google Colab!

### Tomorrow's class (Thursday, April 10)
We'll dive deeper into data structures and control flow, focusing on:
- Advanced data structures (nested lists and dictionaries)
- Complex control flow patterns
- List and dictionary comprehensions
- Error handling

### Assignment
Please complete the Week 1 Minor Assignment before next Wednesday's class. The instructions and starter code are available in the course repository.