# Python Libraries and Numpy

In this notebook, we will take a look at Python libraries in general and a specific library
that is useful in data science, called `numpy`.

## Importing Libraries

A library is a collection of functions and tools that typically serve a related purpose. They
save you from having to write a lot of code yourself. Python has libraries, modules and
packages, which all serve the same general role but are technically different. However, for ease of
discussion, we will use the term library to cover all of these.

To use a library, you must first import it using the `import` keyword. You can also use the `as` keyword to provide an alias, for example if you don't want to have to type a long library name repeatedly.
In the example below, we import the library `my_library` using the alias `ml`.
```
    import my_library as ml
```
## Numpy

The `numpy` library (pronounced num - pie) is a library of numerical functions and tools that are useful in scientific computing, data science and machine learning.

Let's import it and use some of its features. We will start by looking at what you can do with numpy arrays, which are basically lists of numbers (or some other type of data).

In [None]:
# Import the numpy library and refer to it by the name 'np'
import numpy as np

# Create a numpy array from a list
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Print the array
print(x)

## Numpy Array Properties

An array is basically a list of items, similar to Python lists and tuples. One key feature of `numpy` arrays is that the items must all be the same data type, for example `int` or `float`.

Arrays can be one-dimensional like a list:
```
1 2 3 4 5
```
or they can be multi-dimensional, for example with rows and columns.
```
1 2 3 4 5
0 2 4 6 8
```
The code below gives us information about the properties of our array `x`.

In [None]:
# The size property tells you how many items there are in total.
print("Size of array x:", x.size)

# The ndim property tells you how many dimensions - in our case 1.
print("Number of dimensions of x:", x.ndim)

# The shape property tells you the shape in terms of rows and columns. Because our example is one row, we get the size with no other number.
print("Shape of x:", x.shape)

# We can reshape an array. For example we can turn a 10 x 1 array into a 2 x 5 array
x_reshaped = np.reshape(x, (2, 5))
print("Reshaped array:", x_reshaped)

## Summarizing Arrays

We can easily get summary values for an array using the followin methods:

- `x.mean()` returns the numerical mean (average) for all the values in the array `x`.
- `x.max()` returns the maximum value.
- `x.min()` returns the minimum.
- `x.std()` returns the standard deviation (a measure of spread).

Run the code below to see the value of these for our array `x`.

In [None]:
# Calculate some statistics - average value, standard deviation, max value, min value
print(f"The mean (average) is {x.mean():.2f}")
print(f"The standard deviation is {x.std():.2f}")
print(f"The max value is {x.max():.2f}")
print(f"The min value is {x.mean():.2f}")


## Modifying Arrays

With `numpy`, it is easy to change all the values in an array with a line of code. Some examples are shown below.

In [None]:
# Increase every value by 1
print("Original array:", x)
x = x + 1
print("Array with values increased by 1:",x)

In [None]:
# Multiply every value by 3 and add 1
print(x)
x = 3*x + 1
print(x)

In [None]:
# Take the square root of each number in the array
y = np.sqrt(x)
print(y)

In [None]:
# If we don't need to see all the decimal places, we can set the numpy print options
# to display just a few decimal places.
np.set_printoptions(floatmode='fixed', precision=2)
print(y)

## Exercise 1

Copy and paste the following list called `num_list` into the code block below. Then answer the questions below.
```
num_list = [3, -2, 7, 10, 24, -15]
```
1. Use the code `a = np.array(num_list)` to store the list in a numpy array called `a`.
2. Print the size, number of dimensions and shape of the array `a`.
3. Print the maximum and minimum values of the array.
4. Create a new array called `b` who values are equal to 5 times the values in the array `a`.
5. Print the mean of all the values in the array `b`.

In [None]:
# Type your code here


## Combining and Slicing Arrays

It is easy to combine two or more numpy arrays. You can also use indexing to reference parts of an array. This very similar to the indexing we've used to reference parts of a string or list.

Run the code below to see some examples.

In [None]:
# Create two different numpy arrays called p and q
p = np.array([1, 2, 3, 4, 5])
q = np.array([0, 4, 5, 3, -1])

#Add the elements of the arrays together
print(p+q)

# Combine the elements by adding 2 times the first array to 3 times the second array value
print(2*p+3*q)

# Slice the array q so that we keep the first 3 elements
r = q[0:3]
print(r)

## Exercise 2

Here are two numpy arrays:
```
arr1 = np.array([5, 7, -2, 9, 10, 12])
arr2 = np.array([-3, 4, 8, 11, 0, -5])
```

1. Copy and paste the two arrays into the code block below.
2. Write the code to create a third numpy array, `arr3` which is composed of the sum of the elements of `arr1` and `arr2`. Print `arr3`.
3. Finally, create an array called `arr4` which only includes the first four elements of `arr3` (so dropping the last two), and then print `arr4`.

In [None]:
# Type your code here


## Loading Data from a File

You can load data from a CSV (Comma Separated Variable) file and store it in a numpy array.

Let's load some test scores from the file `test_scores.csv`. This file has 3 rows of data, each representing
scores on 3 different tests for a class of 12 students. The first test was out of 10, the second out of 20 and the third out of 100. Each column gives the scores for a particular student.

If you haven't yet downloaded the CSV file to your notebook folder, you can use **File > Open from URL...** and copy and paste this URL: https://raw.githubusercontent.com/guyfrancis/dat1001/refs/heads/main/test_scores.csv

In [None]:
file = "test_scores.csv"
d = np.loadtxt(file, usecols=range(1, 13), skiprows=1, delimiter=",")
print(d)

In [None]:
# The shape of the array d will tell you there are 3 rows each with 12 columns.
d.shape

In [None]:
# We'll put each row in its own array corresponding to the test number
test1 = d[0]
test2 = d[1]
test3 = d[2]
print("Test 1 results:", test1)
print("Test 2 results:", test2)
print("Test 3 results:", test3)

## Some Data Processing

Let's do some data processing on these test scores. 

1. Firstly, we want to remove the last two students from each set of scores as they dropped the class.
2. Secondly, we want to rescale the first two scores so they are out of 100.
3. Thirdly, we will add scores together so we get a total score for each student over the 3 tests.
4. Finally, we will divide all the scores by 3 so we get an average out of 100.


In [None]:
# Drop the last two columns from each test and print the results
test1 = test1[0:10]
test2 = test2[0:10]
test3 = test3[0:10]
print("Test 1 results:", test1)
print("Test 2 results:", test2)
print("Test 3 results:", test3)

In [None]:
# Rescale tests 1 and 2 so they are out of 100 and print the result
test1 = test1*10
test2 = test2*5
print("Test 1 results:", test1)
print("Test 2 results:", test2)
print("Test 3 results:", test3)

In [None]:
# Get a total score for each student
total = test1+test2+test3
print(total)

In [None]:
# Get an average score for each of the students
total_average = total/3
print(total_average)

In [None]:
# We don't like all the decimals, so we will round each score to 1 decimal place
tot_ave_rnd = np.around(total_average, decimals=1)
print(tot_ave_rnd)

In [None]:
# Finally, let's work out the highest, lowest and average total score for the 10 students
print(tot_ave_rnd.min())
print(tot_ave_rnd.max())
print(tot_ave_rnd.mean())

## Exercise 3

We'll use what we have learned to analyze some real data. The file "Temp_Data_Denver_December_2025.csv" contains the daily high and low temperatures for Denver for the month of December 2025. These data are taken from the following website: [Weather Underground](https://www.wunderground.com/)

First, copy the file over from Github if you haven't already:

- Copy this URL: https://raw.githubusercontent.com/guyfrancis/dat1001/refs/heads/main/Temp_Data_Denver_December_2025.csv
- Use **File > Open from URL...** and paste in the URL above.
- Before you run any Python code, take a look at the CSV file. You'll see that each column represents one day in December. There are three rows, corresponding to the daily high temperature, daily average temperature and the daily low temperature. The units are Fahrenheit.

Now answer the questions below in order.
1. Use the code in the first code cell below to load the temperature data into a numpy array.
2. Print the contents of the array `temp_table` and use the `.shape` property to check the shape of the data.
3. Use indexing to create a single one-dimensional numpy array for the first row (daily maximum). Call this `max`.
4. Use indexing to create a single one-dimensional numpy array for the third row (daily minimum). Call this `min`.
5. Now create a new numpy array from the `max` and `min` arrays that represents the daily temperate **range** (max - min). Call this `range`.
6. Find the greatest and least daily temperature ranges for the month of December.
7. Find the average daily temperature range for the month of December.
8. Finally, convert your temperature range, `range` into a new range called `range_degC` which gives the range in degrees Centigrade. To convert the range from Fahrenheit to Centigrade, you will need to multiply the values in `range` by 5 and divide by 9. Print the average daily range in degrees Centigrade.


In [None]:
# Load Denver temperature data
file = "Temp_Data_Denver_December_2025.csv"
temp_table = np.loadtxt(file, usecols=range(1, 31), skiprows=1, delimiter=",")

In [None]:
# Type the rest of your code here
