![Code First: Girls](images/logo_large.png)

#### Python Session 6

# Part 2: Introduction to numpy & more matplotlib

Download files from google drive:
‘book_analysis_code.py’ and ‘book_dataset.csv’

All work will be done in the ‘book_analysis_code_part2.py’ file.

### What are numpy arrays?

A numpy array is a multidimensional container of values of the same type.

Numpy arrays are often faster than lists for doing mathematical operations. 

It is easy to perform addition, subtraction, multiplication, divisions and more complex mathematical operations using numpy arrays.

#### Our first numpy array

In [None]:
import numpy as np # np can be used as a shorthand

my_first_array = np.array((1, 2, 3))

print(my_first_array)

Numpy arrays can be used to perform mathematical operations

In [None]:
array_one = np.array((1, 2, 3))

print(array_one)

Add 5 to each number in the array

In [None]:
array_one += 5 # This means: array_one = array_one + 5

print(array_one)

Multiply each number in the array by 10

In [None]:
array_one *= 10 # This means: array_one = array_one * 10

print(array_one)

##### They can be indexed just like lists

In [None]:
array_one = np.array((9, 4, 3))

# First element
print(array_one[0])


# Last element
print(array_one[-1])


# Get first two elements
print(array_one[: 2])


You can also make arrays of ones:

In [None]:
ones_array = np.ones(10)

print(ones_array)

And zeros:

In [None]:
zeros_array = np.zeros(10)

print(zeros_array)

You can perform lots of functions with them

Such as finding the sum:

In [None]:
array_one = np.array((1, 2, 3))

array_sum = np.sum(array_one)
print(array_sum)

Or the mean:

In [None]:
array_one = np.array((1, 2, 3))

array_mean = np.mean(array_one)
print(array_mean)

You can easily peform operations on multiple arrays, such as adding them together:

In [None]:
array_one = np.array((1, 2, 3))

array_two = np.array((1, 6, 8))

add_array = array_one + array_two
print(add_array)

Or multiplying them:

In [None]:
array_one = np.array((1, 2, 3))

array_two = np.array((1, 6, 8))

multiply_array = array_one*array_two
print(multiply_array)

#### Back to our data: How do we calculate the value of our current stock?

- Multiply the number of copies of each book by price of the book:

Calculate Num_copies_stock * Price for each book.

<center><img src="images/data_pic/book_dataset.png" width="1000"></center>

What is one way of solving this?

1. Extract array of Num_copies_stock
1. Extract array of prices
1. Multiply the arrays Num_copies_stock and prices together
1. Sum the result to find the total stock value

#### Extracting number of copies of each book

In [None]:
import csv
import numpy as np
import matplotlib.pyplot as plt

# Open the dataset
with open('book_dataset.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)

    # read the headers of the csv
    headers = reader.fieldnames

    # create list to store the book data
    book_data = []

    # add book data dictionaries to list from .csv
    for row in reader:
        book_data.append(row)

In [None]:
num_stock = np.array(()) # create an empty array

for book in book_data:
    num_stock = np.append(num_stock,int(book['Num_copies_stock'])) # remember to convert to integer
    
print(num_stock)

**Exercise 2:** Complete the task of calculating the value of our stock:

1. Extract array of num_stock 
1. Extract array of prices
1. Multiply the arrays num_stock and prices together
1. Sum the result to find the total stock value
1. Print the stock value

Solution

In [None]:
num_stock = np.array(())  # Create empty array for number of copies
stock_price = np.array(()) # Create empty array for stock prices

for book in book_data:
    num_stock = np.append(num_stock,int(book['Num_copies_stock'])) # remember to convert to integer
    stock_price = np.append(stock_price,float(book['Price'])) # remember to use 'float' as prices have decimals

print('Num stock: ', num_stock)
print('Stock prices: ', stock_price)

stock_value = np.sum(num_stock*stock_price) # Calculate the stock value

print('The value of the stock is £{}'.format(stock_value))

#### Next Question: How much revenue did we make each month?
Calculate price * copies sold for each month per book.

Sum up the total revenue per month. 

<center><img src="images/data_pic/book_dataset.png" width="1000"></center>

#### Method one: Extract all the arrays, then do the calculation

In [None]:
stock_price = np.array(()) 
jan_sales = np.array(())
feb_sales = np.array(())
march_sales = np.array(())

for book in book_data:
    stock_price = np.append(stock_price,float(book['Price']))
    jan_sales = np.append(jan_sales,int(book['Jan']))
    feb_sales = np.append(feb_sales,int(book['Feb']))    
    march_sales = np.append(march_sales,int(book['March']))   

jan_rev = np.sum(jan_sales*stock_price)
print(jan_rev)

feb_rev = np.sum(feb_sales*stock_price)
print(feb_rev)

march_rev = np.sum(march_sales*stock_price)
print(march_rev)

Problem: Lots of repetition

#### Method two: Calculate all the sales in the loop

In [None]:
jan_sales = np.array(())
feb_sales = np.array(())
march_sales = np.array(())

for book in book_data:
    jan_sales = np.append(jan_sales, float(book['Price']) * int(book['Jan']))
    feb_sales = np.append(feb_sales, float(book['Price']) * int(book['Feb']))
    march_sales = np.append(march_sales, float(book['Price']) * int(book['March']))   

jan_rev = np.sum(jan_sales)
feb_rev = np.sum(feb_sales)
march_rev = np.sum(march_sales)

print(jan_rev)
print(feb_rev)
print(march_rev)

Problem: Still lots of repetition

#### Method three: One for-loop

In [None]:
revenue_months = np.zeros(6) # Pre-allocate an empty array to store the data
print(revenue_months)

for book in book_data:
    revenue_months[0] += float(book['Price']) * int(book['Jan'])
    revenue_months[1] += float(book['Price']) * int(book['Feb'])
    revenue_months[2] += float(book['Price']) * int(book['March'])
    revenue_months[3] += float(book['Price']) * int(book['April'])
    revenue_months[4] += float(book['Price']) * int(book['May'])
    revenue_months[5] += float(book['Price']) * int(book['Jun'])
    
print(revenue_months)

Much better! But: Is there a way to remove the repetition in the loop?

#### Another solution: Use a nested for-loop
Nested for-loops are for-loops within another for-loop.

In [None]:
revenue_months = np.zeros(6)

months = headers[4:] # create list with month names in from the .csv headers
print(months)

for book in book_data:
    for index in range(len(months)):
        revenue_months[index] += float(book['Price']) * int(book[months[index]])
        
print(revenue_months)

What is going on here?

In [None]:
revenue_months = np.zeros(6)

print(revenue_months)
print()

for book in book_data:
    
    for index in range(len(months)):
        
        print('Book: {}, Loop: {}'.format(book['Book'], index))
        
        print('Month: ' + months[index])
        
        print('Price: ' + (book['Price']))
        
        print('Copies sold: '+ (book[months[index]]))
        
        print('Month revenue: {}'.format(float(book['Price']) * int(book[months[index]])))
        
        revenue_months[index] += float(book['Price']) * int(book[months[index]])
        
        print('Monthly revenue running total: {}'.format(revenue_months))
        print()
        
    break
        

<center><img src="images/data_pic/book_dataset.png" width="1000"></center>

In [None]:
plt.figure()
plt.plot(months, revenue_months)
plt.ylabel('Revenue (£)')
plt.xlabel('Month')
plt.title('Book sales revenue per month')
plt.savefig('Book_sales_month.png')
plt.show()

<center><img src="images/data_pic/book_sales_month.png" width="800"></center>

**Exercise 3a:** Calculate how much money we made from selling Children's books each month

**Exercise 3b:**  Plot the result using a suitable chart with matplotlib.

Exercise 3a solution

In [None]:
revenue_months_children = np.zeros(6)

months = headers[4:] # create list with month names in from the .csv headers
print(months)

for book in book_data:
    if book['Genre'] == 'Children':
        for index in range(len(months)):
                revenue_months_children[index] += float(book['Price']) * int(book[months[index]])
        
print(revenue_months_children)

Exercise 3b solution

In [None]:
plt.figure()
plt.plot(months, revenue_months_children)
plt.ylabel('Revenue (£)')
plt.xlabel('Month')
plt.title('Childrens book sales revenue per month')
plt.savefig('Childrens_Book_sales.png')
plt.show()

<center><img src="images/data_pic/childrens_book_sales.png" width="800"></center>