## Spatial Data Science (GEO6119)

---

# Lecture 2: Array and Data Frame

<br>
Instructor: Yi Qiang (qiangy@usf.edu)<br>

___

# Recap of Lab 1

### Variables
What will be the output of the following codes?


In [None]:
a = 3
b = 'hello'
c = str(a) + b
print (c)

In [None]:
a = 100
b = a + 1
a > b

In [None]:
a = ["Hello","World", 2022]

In [None]:
a[1]
#a[1][0]

In [None]:
x = 5
for i in range(x):
    if i%2 == 0:
        print(x)
    else:
        print(i)

# 1. Numpy Array

- A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers.
- An array can be 1-dimensional, 2-dimensional or n-dimensional. For instance, a raster is essentially a geo-coordinated 2D array.
- Arrays support element-wise operations that is not supported by other python data collections (e.g. list)

## 1.1 Create an array
### Creating from a list of the same data type

In [None]:
# Import the Numpy library
import numpy as np

In [None]:
# Create a list of numbers and print it
ls = [69,88,73,90,82,73]
print(ls)
print(type(ls))

In [None]:
# Convert the list to numpy array and print it
a = np.array(ls)

In [None]:
print(a)

In [None]:
# print the data type in the array
print(type(a))

### Create a 2D array

In [None]:
array2D = np.array([[1, 2, 3], [4, 5, 6]])
print(array2D)

In [None]:
# Get the shape of the array
array2D.shape

### Create an array of zeros, ones and a random array

In [None]:
np.empty((4,3))

In [None]:
# Create an 3*4 array of zeros
zeros = np.zeros((3,4))

# Create an 4*3 array of ones
ones = np.ones((4,3))

# Create a 5*2 array of random numbers
ran = np.random.rand(5,2)

In [None]:
print(zeros)
print(ones)
print(ran)

### Values in an array must be in the same type

In [None]:
ls = [69,'second',73,90,82,73]
a = np.array(ls)

## 1.2 Arithmetic Operation on Array

In [None]:
array = np.array([69, 88, 73, 90, 82, 73])

# Each element adds 5
print(array + 5)

# Each element times 2
print(array*2)

# Each element 
print(array**2)

### You can't do arithmatic operations directly to list.

In [None]:
ls=[69,88,73,90,82,73]
np.array(ls) + 5

### Element-wise multiplication

In [None]:
array1 = np.array([[1, 1], [2, 2]])
array2 = np.array([[3, 3], [4, 4]])

# element-wise
np.multiply(array1,array2)

In [None]:
# does the same thing as numpy.mutiply
array1*array2

### Matrix multiplication

In [None]:
array1

In [None]:
array2

In [None]:
np.dot(array1,array2)

### More arithmetic operators

In [None]:
array1

In [None]:
np.sqrt(array1)

In [None]:
array

In [None]:
# Sum of an array
print(array.sum())

# Maximum value
print(array.max())

# Mean
print(array.mean())

# Mean of a slice
print(array[0:3].mean())

### Comparison operator

In [None]:
# Comparing array1 and array2
array1 > array2

### Aggregation operators

In [None]:
# Aggregate array

## 1.3 Accessing attributes of an array

In [None]:
array = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

In [None]:
array

In [None]:
# Shape of an array
print(array.shape)
# Total number of values
print(array.size)
# Number of dimensions
print(array.ndim)
# Type of values
print(array.dtype)

## 1.4 Array Indexing

In [None]:
# Create an 1D array
array = np.array([3,4,2,6,8,1,7,5])
array

In [None]:
# First element
array[1]

In [None]:
# Last element
array[-1]

In [None]:
# First 3 elements
array[:3]

In [None]:
# Elements from the 4th element
array[3:]

In [None]:
# Elements from the 4rd to the 8th
array[3:8]

In [None]:
# Create a 2D array in a shape of 4 rows*5 columns
array = np.random.rand(4,5)
array

In [None]:
# The element in the 1st row and 2nd column
array[0,1]

In [None]:
# The element in the last row and the 2nd column
array[-1,1]

In [None]:
# All elements in the 3rd row
array[2,:]

In [None]:
# Elements from the 2nd to the 3rd row and from the 1st to the 2nd column
array[1:3,0:2]

## 1.5 Boolean comparison and selection

In [None]:
# Comparing each element with 0.5, 
# If greater than 0.5, return True
# If not, return False
array > 0.5

In [None]:
# Return all elements greater than 0.5
array[array>0.5]

In [None]:
# Return all elements greater than 0.5 and smaller than or equal to 0.8
array[~(array<=0.5)]

# 2. Pandas DataFrame


- Built on 2D numpy array.
- A data structure designed for spreadsheet and tables
- A column in a dataframe is a 1-dimensional labeled array
- Different columns in a DataFrame can store different types of values (integer, string, date/time...)

## 2.1 Create or Import a Importing DataFrame

In [None]:
# import the package of pandas
import pandas as pd

# Create a DataFrame
df = pd.DataFrame(
    [['Mike', 28, True],
     ['Emma', 26, False],
     ['Jake', 29, False]],
    index=[1, 2, 3],
    columns=['Name', 'Age', 'Vegetarian'])

In [None]:
df

In [None]:
# read the spreadsheet
df = pd.read_csv('other/purchase.csv')

In [None]:
# Preview the first 5 rows
df.head()

## 2.2 DataFrame Indexing and Slicing

#### Get a column by name

In [None]:
# Getting a column by name, return a Series
df['Units']

In [None]:
# Another way of getting a column by name, return a Series
df['Unit Cost']

In [None]:
# Another way of getting a column by name, return a DataFrame
df[['Units']]

### Getting multiple columns

In [None]:
# Getting multiple columns by name
df[['OrderDate','Region','Rep','Units','Unit Cost']]

In [None]:
# Getting columns using the .loc function
df.loc[:,['OrderDate','Rep','Units']]

### Getting rows and columns

In [None]:
# Getting the rows with an index of 1 and 2
df.loc[[0,1],:]

In [None]:
# Getting the 1st to 10th rows
df[0:10]

In [None]:
# Getting the 1, 3, 5 rows and the Rep, Item and Total column
df.loc[[1,3,5],['Rep','Item','Total']]

In [None]:
# Getting OrderDate and Region from 1-10 rows
df[0:10][['OrderDate','Region']]

## 2.3 Query in DataFrame

Selecting rows where the Rep name is 'Jones'

In [None]:
# Select all records of Jones
df[df['Rep']=='Jones']

You can use & (and), | (or) and ~ (not) to combine conditions to sebset records

In [None]:
# Selecting all records of binders bought by Jones
df[(df.Rep=='Jones') & (df.Item=='Binder')]

## 2.4 Summarize DataFrame

In [None]:
# Calculate total cost from all records
df['Total'].sum()

In [None]:
# Calculate average unit number
df.Units.mean()

In [None]:
df.head()

In [None]:
# Calculate the total cost of pencils made by Jones
df[(df.Item=='Pencil')&(df['Rep']=='Jones')].Total.sum()

In [None]:
# Counting number of records in each name
df['Rep'].value_counts()

In [None]:
# Get the shape of the DataFrame
df.shape

In [None]:
# Get statistics of numeric values in columns
df.describe()

In [None]:
# Get unique values in a column
df['Rep'].unique()

In [None]:
# Get number of unique values in a column
df['Rep'].nunique()

## Useful materials

- Cheatsheet of Numpy Array https://assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf
- Cheatsheet of Data Frame https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf