# Analysis and Visualization of Complex Agro-Environmental Data
---

In [38]:
import math


## Types of data and data collections in Python 

The following data types can be used in base python:
* **boolean**
* **integer**
* **float**
* **string**
* **none**
* complex
* object

Built-in python data types to store collections of data

* **list** (python built-in data type)
* **dictionary** (python built-in data type)
* **tuple** (python built-in data type)
* set (python built-in data type)

Numpy data types
* **arrays**

Pandas data types
* **DataFrames**
* **Series**

Here we will focus on the **bolded** ones

Try to connect these types to the data types mentioned in Lesson 02.

###  Numerical or Quantitative (taking the mean makes sense)
* Discrete
    * Integer (int) #Stored exactly
* Continuous
    * Float (float) #Stored similarly to scientific notation. Allows for decimal places but loses precision.

**Integer**

In [39]:
type(24)

int

In [40]:
type(0)

int

In [41]:
type(-3)

int

In [42]:
#return the mean from an integer list
numbers = [2, 3, 4, 5]
print(sum(numbers)/len(numbers))
type(sum(numbers)/len(numbers))

3.5


float

**Float**

In [43]:
3/5

0.6

In [44]:
6*10**(-1)

0.6000000000000001

In [45]:
type(math.pi)

float

In [46]:
type(4.0)

float

In [47]:
# Try taking the mean
numbers = [math.pi, 3/5, 4.1]
type(sum(numbers)/len(numbers))

float

### Categorical or Qualitative
* Nominal
    * Boolean (bool)
    * String (str)
    * None (NoneType)
* Ordinal
    * Only defined by how you use the data
    * Often important when creating visuals
    * Lists can hold ordinal information because they have indices

**Boolean**

In [48]:
# Boolean
type(True)

bool

In [49]:
# Boolean
if 6 < 5:
    print("Yes!")
else:
    print("No")

No


In [50]:
type(None is None)

bool

In [51]:
myList = [True, 6<5, 1==3, None is None]
for element in myList:
    print(type(element))

<class 'bool'>
<class 'bool'>
<class 'bool'>
<class 'bool'>


In [52]:
print(sum(myList)/len(myList))
type(sum(myList)/len(myList))

0.5


float

**String**

In [53]:
type("This sentence makes sense")

str

In [54]:
type("math.pi")

str

In [55]:
strList = ['dog', 'koala', 'goose']
sum(strList)/len(strList)

TypeError: unsupported operand type(s) for +: 'int' and 'str'

**Nonetype**

Used to define a null variable or object. It is the value a function returns when there is no return statement in the function

In [31]:
# None
type(None)

NoneType

In [32]:
# None
x = None
type(x)

NoneType

In [33]:
def no_return():
     pass # Used as a placeholder for future code - to avoid getting an error when empty code is not allowed. 

print(no_return())

None


In [34]:
def no_data():
     pass # Used as a placeholder for future code - to avoid getting an error when empty code is not allowed. 

print(no_data())

None


In [35]:
var = None
if var is None:
  print("None")
else:
  print("Not None")

None


In [36]:
noneList = [None]*5
sum(noneList)/len(noneList) # can't compute the mean

TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

### Built-in python data types to store collections of data
**Lists**

Used to store multiple items in a single variable.

In [56]:
# List
myList = [1, 1.1, "This is a sentence", None]
for element in myList:
    print(type(element))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'NoneType'>


In [59]:
sum(myList)/len(myList) # can't compute the mean

TypeError: unsupported operand type(s) for +: 'float' and 'str'

In [61]:
# List
myList = [1, 2, 3]
for element in myList:
    print(type(element))
sum(myList)/len(myList) # note that this outputs a float

<class 'int'>
<class 'int'>
<class 'int'>


2.0

In [62]:
myList = ['third', 'first', 'medium', 'small', 'large']
myList[0]

'third'

In [63]:
myList.sort()
myList

['first', 'large', 'medium', 'small', 'third']

**Dictionaries**

Dictionaries are used to store data values in key:value pairs.

A dictionary is a collection which is ordered*, changeable and do not allow duplicates.

In [64]:
mydict = {
  "Landuse": "Agriculture",
  "Cover": 24.5,
  "year": 2015
}
print(mydict)

{'Landuse': 'Agriculture', 'Cover': 24.5, 'year': 2015}


In [65]:
# in alternative:
mydict = dict(Landuse = "Agriculture", Cover = 24.5, year = 2015)
print(mydict)

{'Landuse': 'Agriculture', 'Cover': 24.5, 'year': 2015}


In [66]:
for element in mydict:
    print(type(element))

<class 'str'>
<class 'str'>
<class 'str'>


In [67]:
type(mydict["year"])

int

In [68]:
print(mydict["Landuse"])

Agriculture


In [69]:
mydict.keys()

dict_keys(['Landuse', 'Cover', 'year'])

In [70]:
mydict.values()

dict_values(['Agriculture', 24.5, 2015])

In [71]:
# add keys
mydict["county"] = "Odemira"
print(mydict)

{'Landuse': 'Agriculture', 'Cover': 24.5, 'year': 2015, 'county': 'Odemira'}


In [72]:
# Change values
mydict["year"] = 2020
print(mydict)

{'Landuse': 'Agriculture', 'Cover': 24.5, 'year': 2020, 'county': 'Odemira'}


In [73]:
# no repetitions are allowed
mydict = {
  "Landuse": "Agriculture",
  "Cover": 24.5,
  "year": 2015,
  "year": 2020
}
print(mydict)

{'Landuse': 'Agriculture', 'Cover': 24.5, 'year': 2020}


In [74]:
mydict2 = {'landuse': ['forest', 'pasture', 'cropland', 'urban'], 'cover': [4, 5, 60.54, 8.42]}
print(mydict2)

{'landuse': ['forest', 'pasture', 'cropland', 'urban'], 'cover': [4, 5, 60.54, 8.42]}


**Tuples**

Tuples are used to store multiple items in a single variable.

A tuple is a collection which is ordered and unchangeable.

In [75]:
mytuple = ("Agriculture", 24.5, 2015)
print(mytuple)

('Agriculture', 24.5, 2015)


In [76]:
# Alternative using tuple() constructor
mytuple = tuple(("Agriculture", 24.5, 2015))
print(mytuple)

('Agriculture', 24.5, 2015)


In [77]:
for element in mytuple:
    print(type(element))

<class 'str'>
<class 'float'>
<class 'int'>


In [78]:
len(mytuple)

3

In [79]:
# one item tuple (remember the comma at the end)
thistuple = ("apple",)
print(type(thistuple))

<class 'tuple'>


In [80]:
thistuple = ("apple")
print(type(thistuple))

<class 'str'>


### Numpy data types

**Arrays**

A numpy array is a grid of values, all of the same type, and is **indexed by a tuple** of nonnegative integers. The number of dimensions is the **rank** of the array; the **shape** of an array is a tuple of integers giving the size of the array along each dimension. Possible data types within arrays are: bool, int, float, long, double and long double.

In [81]:
import numpy as np

# create a 1D array
a1 = np.array([1,5,24,67])
print(a1)

[ 1  5 24 67]


In [82]:
# create a 2D array
a2 = np.array([[1,5,24,67], [3,6,26,85], [6,13,54,71]])
print(a2)

[[ 1  5 24 67]
 [ 3  6 26 85]
 [ 6 13 54 71]]


In [83]:
# create a 3D array
a3 = np.array([[[1,5,24,67], [3,6,26,85], [6,13,54,71], [2,9,31,54]]])
print(a3)

[[[ 1  5 24 67]
  [ 3  6 26 85]
  [ 6 13 54 71]
  [ 2  9 31 54]]]


In [84]:
print(type(a1))
print(a1.dtype)
print(a2.shape)
print(a3.strides)

<class 'numpy.ndarray'>
int64
(3, 4)
(128, 32, 8)


In [85]:
a4 = np.array([1.7, 2.0], dtype=np.int64) # force a given data type
print(a4.dtype)
print(a4)

int64
[1 2]


In [86]:
azeros = np.zeros((3,4))
aones = np.ones((2,5))
afull = np.full((3,2),4)
aempty = np.empty((3,2))
print(azeros)
print(aones)
print(afull)
print(aempty)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[[4 4]
 [4 4]
 [4 4]]
[[6.23042070e-307 4.67296746e-307]
 [1.69121096e-306 2.22520559e-306]
 [1.89146896e-307 7.56571288e-307]]


In [87]:
aseq = np.arange(0,25,5) # sequence between 0 and 25 with 5 interval (equivalent to range)
aseq2 = np.linspace(0,25,5) # sequence between 0 and 25, 5 elements equally spaced
print(aseq)
print(aseq2)

[ 0  5 10 15 20]
[ 0.    6.25 12.5  18.75 25.  ]


### Pandas data types

**DataFrames**

DataFrame is the main data structure that Pandas works with - two-dimensional table of data in which the rows typically represent cases and the columns represent variables. Data types may vary among columns. Pandas also has a one-dimensional data structure called a **Series** that we will encounter when accessing a single column of a Data Frame (see below). 

When you import a table in csv format, using the pandas' function pd.read_csv():
```
df = pd.read_csv(url)
type(df)

pandas.core.frame.DataFrame
```
... the imported table will be a DataFrame.


In [88]:
import pandas as pd

mydict2 = {'landuse': ['forest', 'pasture', 'cropland', 'urban'], 'cover': [4, 5, 60.54, 8.42]}
mydf = pd.DataFrame(mydict2)
print(mydf)

    landuse  cover
0    forest   4.00
1   pasture   5.00
2  cropland  60.54
3     urban   8.42


In [89]:
mydf.head(2)

Unnamed: 0,landuse,cover
0,forest,4.0
1,pasture,5.0


In [90]:
col = mydf.columns
print(col)
list(col)

Index(['landuse', 'cover'], dtype='object')


['landuse', 'cover']

In [91]:
print(mydf.dtypes) # Note: an object dtype is equivalent to a python string type; columns with mixed types are stored with the object dtype.

landuse     object
cover      float64
dtype: object


In [92]:
print(mydf.shape)

(4, 2)


**Series**

Panda Series is a one-dimensional sequential data structure that is able to handle any data type of data, such as string, numeric, datetime, lists and dictionaries.
A column of a DataFrame is equivalent to a Series.

In [93]:
mydict = dict(Landuse = "Agriculture", Cover = 24.5, year = 2015)
myseries = pd.Series(mydict)
print(myseries)

Landuse    Agriculture
Cover             24.5
year              2015
dtype: object


In [94]:
print(myseries.shape)

(3,)


In [95]:
a1 = np.array([1,5,24,67])
myseries2 = pd.Series(a1)
print(myseries2)

0     1
1     5
2    24
3    67
dtype: int64


In [96]:
myseries3 = pd.Series(10, index=[1,2,3,4,5])
print(myseries3)

1    10
2    10
3    10
4    10
5    10
dtype: int64


In [97]:
myseries4 = mydf['cover']
type(myseries4)

pandas.core.series.Series

: 