# Analysis and Visualization of Complex Agro-Environmental Data
---

In [3]:
import math


## Types of data and data collections in Python 

The following data types can be used in base python:
* **boolean**
* **integer**
* **float**
* **string**
* **none**
* complex
* object

Built-in python data types to store collections of data

* **list** (python built-in data type)
* **dictionary** (python built-in data type)
* **tuple** (python built-in data type)
* set (python built-in data type)

Numpy data types
* **arrays**

Pandas data types
* **DataFrames**
* **Series**

Here we will focus on the **bolded** ones

Try to connect these types to the data types mentioned in Lesson 02.

###  Numerical or Quantitative (taking the mean makes sense)
* Discrete
    * Integer (int) #Stored exactly
* Continuous
    * Float (float) #Stored similarly to scientific notation. Allows for decimal places but loses precision.

**Integer**

In [None]:
type(24)

In [None]:
type(0)

In [None]:
type(-3)

In [None]:
#return the mean from an integer list
numbers = [2, 3, 4, 5]
print(sum(numbers)/len(numbers))
type(sum(numbers)/len(numbers))

**Float**

In [None]:
3/5

In [None]:
6*10**(-1)

In [None]:
type(math.pi)

In [None]:
type(4.0)

In [None]:
# Try taking the mean
numbers = [math.pi, 3/5, 4.1]
type(sum(numbers)/len(numbers))

### Categorical or Qualitative
* Nominal
    * Boolean (bool)
    * String (str)
    * None (NoneType)
* Ordinal
    * Only defined by how you use the data
    * Often important when creating visuals
    * Lists can hold ordinal information because they have indices

**Boolean**

In [None]:
# Boolean
type(True)

In [23]:
# Boolean
if 6 < 5:
    print("Yes!")

In [None]:
type(None is None)

In [None]:
myList = [True, 6<5, 1==3, None is None]
for element in myList:
    print(type(element))

In [None]:
print(sum(myList)/len(myList))
type(sum(myList)/len(myList))

**String**

In [None]:
type("This sentence makes sense")

In [None]:
type("math.pi")

In [None]:
strList = ['dog', 'koala', 'goose']
sum(strList)/len(strList)

**Nonetype**

Used to define a null variable or object. It is the value a function returns when there is no return statement in the function

In [None]:
# None
type(None)

In [None]:
# None
x = None
type(x)

In [None]:
def no_return():
     pass # Used as a placeholder for future code - to avoid getting an error when empty code is not allowed. 

print(no_return())

In [None]:
def no_data():
     pass # Used as a placeholder for future code - to avoid getting an error when empty code is not allowed. 

print(no_data())

In [None]:
var = None
if var is None:
  print("None")
else:
  print("Not None")

In [None]:
noneList = [None]*5
sum(noneList)/len(noneList) # can't compute the mean

### Built-in python data types to store collections of data
**Lists**

Used to store multiple items in a single variable.

In [None]:
# List
myList = [1, 1.1, "This is a sentence", None]
for element in myList:
    print(type(element))

In [None]:
sum(myList)/len(myList) # can't compute the mean

In [None]:
# List
myList = [1, 2, 3]
for element in myList:
    print(type(element))
sum(myList)/len(myList) # note that this outputs a float

In [None]:
myList = ['third', 'first', 'medium', 'small', 'large']
myList[0]

In [None]:
myList.sort()
myList

**Dictionaries**

Dictionaries are used to store data values in key:value pairs.

A dictionary is a collection which is ordered*, changeable and do not allow duplicates.

In [None]:
mydict = {
  "Landuse": "Agriculture",
  "Cover": 24.5,
  "year": 2015
}
print(mydict)

In [None]:
# in alternative:
mydict = dict(Landuse = "Agriculture", Cover = 24.5, year = 2015)
print(mydict)

In [None]:
for element in mydict:
    print(type(element))

In [None]:
type(mydict["year"])

In [None]:
print(mydict["Landuse"])

In [None]:
mydict.keys()

In [None]:
mydict.values()

In [None]:
# add keys
mydict["county"] = "Odemira"
print(mydict)

In [None]:
# Change values
mydict["year"] = 2020
print(mydict)

In [None]:
# no repetitions are allowed
mydict = {
  "Landuse": "Agriculture",
  "Cover": 24.5,
  "year": 2015,
  "year": 2020
}
print(mydict)

In [None]:
mydict2 = {'landuse': ['forest', 'pasture', 'cropland', 'urban'], 'cover': [4, 5, 60.54, 8.42]}
print(mydict2)

**Tuples**

Tuples are used to store multiple items in a single variable.

A tuple is a collection which is ordered and unchangeable.

In [None]:
mytuple = ("Agriculture", 24.5, 2015)
print(mytuple)

In [None]:
# Alternative using tuple() constructor
mytuple = tuple(("Agriculture", 24.5, 2015))
print(mytuple)

In [None]:
for element in mytuple:
    print(type(element))

In [None]:
len(mytuple)

In [None]:
# one item tuple (remember the comma at the end)
thistuple = ("apple",)
print(type(thistuple))

In [None]:
thistuple = ("apple")
print(type(thistuple))

### Numpy data types

**Arrays**

A numpy array is a grid of values, all of the same type, and is **indexed by a tuple** of nonnegative integers. The number of dimensions is the **rank** of the array; the **shape** of an array is a tuple of integers giving the size of the array along each dimension. Possible data types within arrays are: bool, int, float, long, double and long double.

In [None]:
import numpy as np

# create a 1D array
a1 = np.array([1,5,24,67])
print(a1)

In [None]:
# create a 2D array
a2 = np.array([[1,5,24,67], [3,6,26,85], [6,13,54,71]])
print(a2)

In [None]:
# create a 3D array
a3 = np.array([[[1,5,24,67], [3,6,26,85], [6,13,54,71], [2,9,31,54]]])
print(a3)

In [None]:
print(type(a1))
print(a1.dtype)
print(a2.shape)
print(a3.strides)

In [None]:
a4 = np.array([1.7, 2.0], dtype=np.int64) # force a given data type
print(a4.dtype)
print(a4)

In [None]:
azeros = np.zeros((3,4))
aones = np.ones((2,5))
afull = np.full((3,2),4)
aempty = np.empty((3,2))
print(azeros)
print(aones)
print(afull)
print(aempty)

In [None]:
aseq = np.arange(0,25,5) # sequence between 0 and 25 with 5 interval (equivalent to range)
aseq2 = np.linspace(0,25,5) # sequence between 0 and 25, 5 elements equally spaced
print(aseq)
print(aseq2)

### Pandas data types

**DataFrames**

DataFrame is the main data structure that Pandas works with - two-dimensional table of data in which the rows typically represent cases and the columns represent variables. Data types may vary among columns. Pandas also has a one-dimensional data structure called a **Series** that we will encounter when accessing a single column of a Data Frame (see below). 

When you import a table in csv format, using the pandas' function pd.read_csv():
```
df = pd.read_csv(url)
type(df)

pandas.core.frame.DataFrame
```
... the imported table will be a DataFrame.


In [None]:
import pandas as pd

mydict2 = {'landuse': ['forest', 'pasture', 'cropland', 'urban'], 'cover': [4, 5, 60.54, 8.42]}
mydf = pd.DataFrame(mydict2)
print(mydf)

In [None]:
mydf.head(2)

In [None]:
col = mydf.columns
print(col)
list(col)

In [None]:
print(mydf.dtypes) # Note: an object dtype is equivalent to a python string type; columns with mixed types are stored with the object dtype.

In [None]:
print(mydf.shape)

**Series**

Panda Series is a one-dimensional sequential data structure that is able to handle any data type of data, such as string, numeric, datetime, lists and dictionaries.
A column of a DataFrame is equivalent to a Series.

In [None]:
mydict = dict(Landuse = "Agriculture", Cover = 24.5, year = 2015)
myseries = pd.Series(mydict)
print(myseries)

In [None]:
print(myseries.shape)

In [None]:
a1 = np.array([1,5,24,67])
myseries2 = pd.Series(a1)
print(myseries2)

In [None]:
myseries3 = pd.Series(10, index=[1,2,3,4,5])
print(myseries3)

In [None]:
myseries4 = mydf['cover']
type(myseries4)