# Day 2

# Part 1: Variables and Data types

In [None]:
import pandas as pd
import matplotlib.pyplot as plt


In [None]:
print("Welcome to Day 2: Exploring Data!")
print("------------------------------------------")

Now that our working environment is set, and we know how to edit and execute Python code through Jupyter Notebook (see {ref}`setup`), we move on the Python language itself. In this chapter, we introduce the basic concepts, operators, and data types, in Python.

Throughout most of the chapter, we are going to cover *data types*, in terms of their properties and their behavior. These include the elementary "atomic" data types, namely: 

* `int` (see {ref}`numbers-int-float`),
* `float` (see {ref}`numbers-int-float`), 
* `bool` (see {ref}`boolean-values`), and
* `None` (see {ref}`none`),

as well as the more complex "collection" data types, namely: 

* `str` (see {ref}`strings`),
* `list` (see {ref}`lists`),
* `dict` (see {ref}`tuples`),
* `tuple` (see {ref}`dict`), and
* `set` (see {ref}`sets`).

(variables)=
## Variables and assignment

Assignment in Python is done using the assignment operator `=`: 

* To the left of the `=` operator we specify the variable *name* of our choice
* To the right of the `=` operator we specify the *value* to be assigned

For example, the following expression assigns the numeric value of `3` to a variable named `x`:

(functions)=
## Functions

Functions are named pieces of code, to perform a particular job. We will often be executing: 

* Built-in functions
* Functions from the standard library
* Functions from third-party packages (see {ref}`loading-packages`)

(basic-data-types)=
## Data types

```{table} Python data types
:name: data-types

| Data type | Meaning | Divisibility | Mutability | Example |
|---|---|---|---|---|
| `int` | Integer | atomic | immutable | `7` | 
| `float` | Float | atomic | immutable | `3.2` | 
| `bool` | Boolean | atomic | immutable | `True` |
| `None` | None | atomic | immutable | `None` |
| `str` | String | collection | immutable | `"Hello!"` | 
| `list` | List | collection | mutable | `[1,2,3]` |
| `tuple` | Tuple | collection | immutable | `(1,2)` |
| `dict` | Dictionary | collection | mutable | `{"a":2,"b":7}` |
| `set` | Set | collection | mutable | `{"a","b"}` |
```

(arithmetic-operators)=
### Arithmetic operators

The ordinary arithmetic operators in Python are given in {numref}`arithmetic-ops`.

```{table} Arithmetic operators in Python
:name: arithmetic-ops

| Operator | Meaning |
|---|---|
| `+` | Addition |
| `-` | Subtraction |
| `*` | Multiplication |
| `/` | Division |
| `**` | Exponent |
| `//` | Floor divition |
| `%` | Modulus |
```

### Increment assignment

Another commonly used Python operator is the *increment assignment* operator `+=`. The increment assignment is a shortcut to addition combined with assignment, i.e., `x+=y` is a shorter way to express `x=x+y`. For example:

In [None]:
x = 10
x += 5
x

(boolean-values)=
## Boolean values (`bool`)

[Boolean values](https://en.wikipedia.org/wiki/Boolean_data_type) represent one of two states, "true" or "false". Accordingly, the boolean data type in Python can have just one of two possible values, `True` and `False`. Boolean values can be created by literally typying `True` and `False`: 

In [None]:
x = True

In [None]:
x

In [None]:
type(x)

Conditions involve *conditional operators*, such as `>` (greater than) in the above example. The conditional operators in Python are summarized in {numref}`conditional-ops`.

```{table} Conditional operators in Python
:name: conditional-ops

| Operator | Meaning |
|---|---|
| `==` | Equal |
| `!=` | Not equal |
| `<` | Less than |
| `<=` | Less than or equal |
| `>` | Greater than |
| `>=` | Greater than or equal |
```

In [None]:
ax = 11
ax > 10

In [None]:
x <= 10

```{note}
Working with stings is less relevant for our purposes in this book. Nevertheless, here is a list of useful string methods to get an impression of the built-in methods for strings in Python:

* `.strip`—Remove spaces from start and end
* `.lower`—Convert to lowercase
* `.upper`—Convert to uppercase
* `.title`—Convert to titlecase
* `.startswith(pattern)`—Check if string starts with `pattern`
* `.endswith(pattern)`—Check if string ends with `pattern`
* `.find(pattern)`—Find the index of `pattern` within the string
* `sep.join([str1, str2, ...])`—Join strings `str1`, `str2`, etc., using the `sep` string as separator 
```

# Part 2: Introduction to Pandas


In [None]:
print("\nPart 1: Loading data with pandas")
print("--------------------------------")


# Create a simple sample dataset if needed (students would normally load real data)
# This simulates a CSV file with temperature readings from satellite over time


In [None]:
data = {
    'Date': ['2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01',
             '2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01', '2024-01-10'],
    'Temperature': [22.5, 23.1, 23.8, 22.9, 24.0, 25.2, 24.8, 23.5, 22.0, 21.5],
    'Location': ['Forest', 'Forest', 'Forest', 'Forest', 'Forest', 
                 'Urban', 'Urban', 'Urban', 'Urban', 'Urban']
}


In [None]:
data

In [None]:
# Create DataFrame
df = pd.DataFrame(data)


In [None]:
# Display the first few rows
print("\nFirst 5 rows of our satellite temperature data:")
print(df.head())


In [None]:
# Display basic statistics
print("\nBasic statistics about our temperature data:")
print(df.describe())


In [None]:
# Convert Date to datetime format
df['Date'] = pd.to_datetime(df['Date'])


In [None]:
print("\nInformation about our dataset:")
print(f"Number of readings: {len(df)}")
print(f"Average temperature: {df['Temperature'].mean():.1f}°C")
print(f"Maximum temperature: {df['Temperature'].max()}°C")
print(f"Minimum temperature: {df['Temperature'].min()}°C")


# Part 3: Basic Data Visualization


In [None]:
print("\n\nPart 2: Visualizing  data")
print("----------------------------------")


In [None]:
# Create a basic line plot of temperature over time
print("Creating a line plot of temperature over time...")


In [None]:
plt.figure(figsize=(10, 6))
plt.plot(df['Date'], df['Temperature'], marker='o', linestyle='-', color='red')
plt.xlabel('Date')
plt.ylabel('Temperature (°C)')
plt.title('Satellite Temperature Readings Over Time')
plt.grid(True)
plt.xticks(rotation=45)


In [None]:
# Save the plot to a file
plt.tight_layout()
plt.savefig('temperature_plot.png')
print("Plot saved as 'temperature_plot.png'")


In [None]:
# Create a plot comparing temperatures by location
print("\nCreating a plot comparing temperatures by location...")


In [None]:
# Calculate average temperature by location
location_temps = df.groupby('Location')['Temperature'].mean()


In [None]:
plt.figure(figsize=(8, 6))
location_temps.plot(kind='bar', color=['green', 'gray'])
plt.xlabel('Location Type')
plt.ylabel('Average Temperature (°C)')
plt.title('Average Temperature by Location Type')
plt.grid(axis='y')


In [None]:
# Save the plot to a file
plt.tight_layout()
plt.savefig('location_temperature_plot.png')
print("Plot saved as 'location_temperature_plot.png'")


## Geometries: Points, Linestrings and Polygons

Spatial **vector** data can consist of different types, and the 3 fundamental types are:

![](simple_features_3_text.svg)

* **Point** data: represents a single point in space.
* **Line** data ("LineString"): represents a sequence of points that form a line.
* **Polygon** data: represents a filled area.

And each of them can also be combined in multi-part geometries (See https://shapely.readthedocs.io/en/stable/manual.html#geometric-objects for extensive overview).

In [None]:
## Importing geospatial data library
import pandas as pd
import geopandas

We can use the GeoPandas library to read many of those file formats (relying on the `fiona` library under the hood, which is an interface to GDAL/OGR), using the `geopandas.read_file` function.

In [None]:
countries = geopandas.read_file("ne_110m_admin_0_countries.zip")

For the example we have seen up to now, the individual geometry objects are Polygons:

In [None]:
print(countries.geometry[2])

Let's import some other datasets with different types of geometry objects.

A dateset about cities in the world consisting of Point data:

In [None]:
cities = geopandas.read_file("ne_110m_populated_places.zip")

And a dataset of rivers in the world where each river is a (multi-)line:

In [None]:
rivers = geopandas.read_file("ne_50m_rivers_lake_centerlines.zip")

In [None]:
print(rivers.geometry[0])

### The `shapely` library

The individual geometry objects are provided by the [`shapely`](https://shapely.readthedocs.io/en/stable/) library

In [None]:
type(countries.geometry[0])

In [None]:
from shapely.geometry import Point, Polygon, LineString

In [None]:
p = Point(0, 0)

In [None]:
print(p)

In [None]:
polygon = Polygon([(1, 1), (2,2), (2, 1)])

In [None]:
polygon.area

In [None]:
polygon.distance(p)

<div class="alert alert-info" style="font-size:120%">

**REMEMBER**: <br>

Single geometries are represented by `shapely` objects:

* If you access a single geometry of a GeoDataFrame, you get a shapely geometry object
* Those objects have similar functionality as geopandas objects (GeoDataFrame/GeoSeries). For example:
    * `single_shapely_object.distance(other_point)` -> distance between two points
    * `geodataframe.distance(other_point)` ->  distance for each point in the geodataframe to the other point

</div>

## Plotting our different layers together

In [None]:
# fig, ax = plt.subplots(figsize=(15, 10))
ax = countries.plot(edgecolor='k', facecolor='none', figsize=(15, 10))
rivers.plot(ax=ax)
cities.plot(ax=ax, color='red')
ax.set(xlim=(-20, 60), ylim=(-40, 40))

In [None]:
countries.crs

In [None]:
countries.plot()

In [None]:
# remove Antartica, as the Mercator projection cannot deal with the poles
countries = countries[(countries['NAME'] != "Antarctica")]

In [None]:
countries_mercator = countries.to_crs(epsg=3395)  # or .to_crs("EPSG:3395")

In [None]:
countries_mercator.plot()

In [None]:
# Import the districts dataset
districts = geopandas.read_file("INDIA_DISTRICTS.geojson")

In [None]:
# Check the CRS information
districts.crs

In [None]:
# Show the first rows of the GeoDataFrame
districts

In [None]:
# Plot the districts dataset
districts.plot()

In [None]:
# Calculate the area of all districts
# Step 1: Choose an appropriate projected CRS
# Example: EPSG:3857 (Web Mercator) or an appropriate UTM zone
districts_projected = districts.to_crs(epsg=3857)

# Step 2: Calculate the area in square meters (since EPSG:3857 uses meters)
districts['area_m2'] = districts_projected.geometry.area

# Optional: Convert to square kilometers if needed
districts['area_km2'] = districts['area_m2'] / 1e6

print(districts['area_km2'])

# Lets Practice over Gujarat

In [None]:
# Import the districts dataset
GUJ_districts = geopandas.read_file("gujarat.geojson")

In [None]:
# Show the first rows of the GeoDataFrame
GUJ_districts.head()

In [None]:
# Plot the districts dataset
GUJ_districts.plot()

In [None]:
print("\nEnd of Day 2 - You've learned how to load and visualize data!")
