# Lab 03

## Python Libraries and Packages

In this assignment, you’ll be introduced to Python and learn how to import packages and use their functions effectively in your code.

## Guidelines

- Follow good programming practices by using descriptive variable names, maintaining appropriate spacing for readability, and adding comments to clarify your code.

- Ensure written responses use correct spelling, complete sentences, and proper grammar.

**Name:**

**Section:**

**Date:**

Let's get started!

### What are Python libraries and packages?

In Python, **libraries** and **packages** help you do more with less code. A **library** is a collection of pre-written code that provides tools and functions for specific tasks—like working with data, making graphs, or doing math. A **package** is a way to organize that code into structured folders and modules so it's easier to manage and reuse.

Libraries often include their own custom data structures to make working with certain types of data easier. For example, the:

- `datascience` package provides a `Table` data structure for analyzing tabular data. 

- `numpy` library introduces the `ndarray` for performing fast, efficient computations on numerical data. 

- `matplotlib` library offers tools to create visualizations like bar charts and scatter plots, helping you understand data through graphs.

When you **import a package or library**, you're gaining access to these tools and data structures.

### Lists

Lists are a versatile data structure in Python, allowing us to store and manipulate collections of data efficiently. However, Python lists have limitations when it comes to performing arithmetic operations directly on their elements. Let’s explore these limitations.

**Question 1.** Create a list named `one_to_ten` that contains the integers from 1 to 10.

In [1]:
one_to_ten = ...

What do you think the code below will do?

```python
one_to_ten * 2
```

Run the cell below to see.

In [2]:
one_to_ten * 2

TypeError: unsupported operand type(s) for *: 'ellipsis' and 'int'

### NumPy

#### What is NumPy?

- NumPy is a Python library used for working with arrays.

- It also has functions for working in domain of linear algebra, fourier transform, and matrices.

- NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.

- NumPy stands for Numerical Python.

#### Why use NumPy?

- In Python we have lists that serve the purpose of arrays, but they are slow to process.

- NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.

- The array object in NumPy is called `ndarray`, it provides a lot of supporting functions that make working with `ndarray` very easy.

Arrays are very frequently used in data science, where speed and resources are very important.

**Source:** [W3Schools](https://www.w3schools.com/python/numpy/numpy_intro.asp)

We need to import NumPy because it doesn't come pre-installed with Python by default. While Python provides built-in data structures like lists, NumPy offers more specialized, efficient tools for numerical and scientific computing. To access these features, you must explicitly install and import the NumPy library.

**Question 2.** Import the NumPy library and give it the alias `np`.

In [None]:
...

The alias `np` it easier to use NumPy functions—for example, instead of writing `numpy.array()`, you can simply write `np.array()`.

We can create a Numpy array with the `np.array` command and using a list as the function parameter. We can enter the list manually, or we can enter the name of a list that has already been defined. For example, the code 

```python
np.array([1, 2, 3, 4,5, 6, 7, 8, 9, 10])
```

takes the list `[1, 2, 3, 4,5, 6, 7, 8, 9, 10]` and returns it as a Numpy array

```
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
```

Run the cell below to see.

In [None]:
np.array([1, 2, 3, 4,5, 6, 7, 8, 9, 10])

**Question 3.** Convert the list `one_to_ten` into a NumPy array and store it in a variable called `arr`.

In [None]:
arr = ...
arr

Run the code cell below. What do you notice?

In [None]:
arr * 2

NumPy provides a wide range of built-in functions for performing computations on arrays. It includes operations such as addition, multiplication, and statistical calculations, all optimized for handling large datasets.

**Question 4.** Load the `world_population.csv` file into a 1-dimensional Numpy array using the `loadtxt()` function. The dataset contains world population estimates from 1950 - 2025. More information on the dataset can be found in the [World Population Datasheet](https://docs.google.com/document/d/16l713CUyuwHvWURkGiUJB6kh5G09ihiNioG7wBjQECQ/edit?usp=sharing).

**Note:** To use functions from te `Numpy` module we need to use the alias we set up in our imoprt statement.

In [None]:
population = ...
population

**Question 5.** Use NumPy’s `diff()` function to calculate the year-to-year change in population values.

In [None]:
diff = ...
diff

**Question 6.** Calculate the percent change in population.

In [None]:
pct_change = ...
pct_change 

### Matplotlib

#### What is Matplotlib?

- Matplotlib is a popular Python library used for creating static, animated, and interactive visualizations.

- It was originally developed by John D. Hunter in 2003 and is now maintained by a large community of contributors.

- Matplotlib is highly customizable and works well with other scientific computing libraries like NumPy.

- It is widely used in data science and engineering for making line plots, bar charts, scatter plots, histograms, and more.

#### Why use Matplotlib?

- Python doesn’t have built-in tools for plotting data, so Matplotlib fills this gap with a plotting interface.

- It gives you control over aspects of your figures like titles, axes, colors, grids, and more.

- `matplotlib.pyplot` is its most commonly used module, often imported with the alias `plt`.

Visualizations are important in data science for exploring trends, making comparisons, and communicating results.

**Source:** [Matplotlib Docs](https://matplotlib.org/stable/users/index.html)

We need to import Matplotlib explicitly because it is not included with the core Python installation. Once installed and imported, it gives you tools to create meaningful data visualizations from arrays, tables, or statistical models.

Run the cell below to import Matplotlib with the appropriate alias.

In [None]:
import matplotlib.pyplot as plt

We can visualize the percent change in population over time using a line chart. Since the percent change values are already stored in a NumPy array called `pct_change`, the next step is to create a matching array of years. The `np.array()` function can accept iterables like `range(5)` as input.

Run the code cells below to see examples.

In [None]:
# Creates an array of the values 1 to 5
np.array(range(5))

In [None]:
# Creates an array of the values 1 to 5
np.array(range(1, 6))

**Question 7.** Create a NumPy array for the years 1950 - 2024.

In [None]:
year = ...
year

Now we can create a line chart.

In [None]:
# Assign the x-axis values (years) and y-axis values (percent change)
x = year
y = pct_change

# Create a line plot with years on the x-axis and percent change on the y-axis
plt.plot(x, y)

# Set the y-axis to start at 0
plt.ylim(bottom = 0)

# Label the x-axis
plt.xlabel('Year')

# Label the y-axis
plt.ylabel('Percent Change (%)')

# Add a title to the chart
plt.title('Population Growth Over Time');

### Datascience

#### What is the Datascience Package?

- The Datascience package is a Python library developed by UC Berkeley for teaching introductory data science.

- It provides a simplified interface for working with tabular data, making it accessible to beginners.

- The main feature is the `Table` class, which allows students to load, explore, transform, and visualize data.

#### Why use Datascience?

- It is beginner-friendly and designed specifically for students with little or no prior programming experience.

- The `Table` data structure makes it easy to perform common data analysis tasks like filtering, grouping, and plotting.

- Built-in methods like `.show()`, `.scatter()`, `.group()`, and `.where()` help users focus on learning data concepts rather than syntax.

The `datascience` package is a great starting point for learning data science fundamentals before transitioning to more complex tools.

**Source:** [Data 8 Reference](https://www.data8.org/datascience/)

Since `datascience` is not a built-in module, you must install and import it before use. Once imported, it offers a toolkit for working with real datasets in educational settings.

To import the `datascience` package use the code

```python
from datascience import *
```

`from datascience import *` means *“Import everything from the datascience package into the current namespace.”* where,

- `datascience` is the name of the package

- `*` means import all classes, functions, and variables defined in the package.


This gives us direct access to tools like `Table`, plots, and functions without needing to prefix them with `datascience`. After this import, you can do:

```python
Table().with_columns('A', [1, 2, 3])
```

instead of:

```python
datascience.Table().with_columns('A', [1, 2, 3])
```

**Question 8.** Import the `datascience` package.

In [None]:
...

We can load a wolrd population dataset that has more features (variables) than just the population. More information on the dataset can be found in the [World Population Data Datasheet](https://docs.google.com/document/d/1dBuTsb7YAgj5h8Zp_CoQnfLeFh4Q-SKC54lLZgK08Wc/edit?usp=sharing).

Run the cell below to load the `world_population_data.csv` dataset.

In [None]:
world_population = Table.read_table('data/world_population_data.csv')
world_population

**Question 9.** What do you notice? What do you wonder? Take a moment to reflect on the chart or data above. Discuss your observations and questions with a partner, then write down three insights or curiosities in the cell below.

_TYPE YOUR RESPONSE IN THIS CELL REPLCAING THIS TEXT_

**Question 10.** Refer to [Python Reference](https://www.data8.org/fa23/reference/) to find the method used to determine the number of rows, the number of columns, and the column labels in a table. Then, use `print()` statements and the appropriate method(s) to display the number of rows and columns in the `world_population` table.

In [None]:
...

## Submission

Make sure that all cells in your assignment have been executed to display all output, images, and graphs in the final document.

**Note:** Save the assignment before proceeding to download the file.

After downloading, locate the `.ipynb` file and upload **only** this file to Moodle. The assignment will be automatically submitted to Gradescope for grading.