# Arrays

Arrays are continuous sequences of values of a homogeneous type.  The simplest way to emulate this in Python is using the `list` datatype.  However, lists do not enforce this homogeneity of the data.

In [None]:
# Define a function over a list
def mymean(x):
    s = 0
    for i in x:
        s += i
    return s / len(x)

In [None]:
# Create a list and check the function output
# x = range(10000)
x = [float(x) for x in range(10000000)]
print(mymean(x))

# Benchmarking

> Do not try to optimize what you have not measured.

Before we try to improve the speed of code, we first need to know 

- whether it can in fact be improved, or it is already as fast as possible given the hardware
- which parts of the code should we target for optimization

In [None]:
# A library that provides a magic function to benchmark functions
import timeit

In [None]:
# Actually measure the time for the function defined earlier
%timeit mymean(x)

In [None]:
# Another library - this provides other ways of handling arrays
import numpy as np

In [None]:
print(np.mean(x))

In [None]:
%timeit np.mean(x)

In [None]:
xn = np.array(x)
%timeit np.mean(xn)

# Memory Usage

The notebook interface (or `ipython` in general) allows us to easily see the various elements that exist in the namespace, along with some more information about them.  Use the `who` or `whos` commands for this.

In [None]:
whos

# 2-D arrays

How do we represent a 2-D array in Python - the most obvious way is a list of lists.

In [None]:
xm=[[x for x in range(i, i+100)] for i in range(100)]
print(xm)

In [None]:
print(xm[3][4])

In [None]:
def rowmean(xm):
    s = []
    for l in xm:
        s.append(mymean(l))
    return s

In [None]:
print(rowmean(xm))

In [None]:
xmn = np.array(xm)
print(type(xmn[0]))

In [None]:
np.mean(xmn)

In [None]:
print(np.mean(xmn, axis=0))

In [None]:
%timeit rowmean(xm)

In [None]:
%timeit np.mean(xmn, axis=0)

In [None]:
print(xmn)

# File I/O

Rather than just creating dummy data, we should be able to read in data from other sources.  Most often, these sources are files containing data.  One simple way to do this would be reading data from files.  For now, we will assume that the data is present linewise as text.  This is NOT the most efficient way to store data, but is easiest to work with.

In [None]:
# Generate random data using library function
import random
#r = np.random.random((100,100))
#np.savetxt('random.txt', r)

In [None]:
# Now read in to a regular list
f = open('random.txt', 'r')
rm = []
for l in f:
    r = [float(x) for x in l.split()]
    rm.append(r)
print(rm)

In [None]:
print(rowmean(rm))

In [None]:
# Alternative using numpy methods - only for appropriately formatted data of course
rmn = np.loadtxt('random.txt')

In [None]:
print(np.mean(rmn, axis=1))

In [None]:
print(rmn[0,0:10])

# Generalize File I/O and Strings

What if we want to read something like strings that cannot be handled by numpy?

In [None]:
fr = open('randomtext.txt', 'r')
count = 0
for l in fr:
    for word in l.split():
        if word == "the":
            count += 1
print(count)

## Fake files using StringIO

For our testing purposes, we may not always be able to read/write files.  Instead we will create *fake* files where we emulate the behaviour of a file using data from a string.

In [None]:
randstr = """There are many variations of passages of Lorem Ipsum available, 
but the majority have suffered alteration in some form, by injected humour, 
or randomised words which don't look even slightly believable. 
If you are going to use a passage of Lorem Ipsum, you need to be sure 
there isn't anything embarrassing hidden in the middle of text. 
All the Lorem Ipsum generators on the Internet tend to repeat predefined 
chunks as necessary, making this the first true generator on the Internet. 
It uses a dictionary of over 200 Latin words, combined with a handful of 
model sentence structures, to generate Lorem Ipsum which looks reasonable. 
The generated Lorem Ipsum is therefore always free from repetition, 
injected humour, or non-characteristic words etc.
"""

In [None]:
import io
sfr = io.StringIO(randstr)
count = 0
for l in sfr:
    for word in l.split():
        if word == "the":
            count += 1
print(count)