# Cover Page 
### Student ID: 210003508
### Module Code: GG4257 
### Module Title: Urban Analytics: A Toolkit for Sustainable Urban Development
### Assignment: Lab Assignment No 1 : Handling and GeoVisualisation of Urban Data
### Degree Programme: Geography 
### Deadline Date: 27.02.2025

In submitting this assignment, I hereby confirm that:

I have read the University's statement on Good Academic Practice; that the following work is my own work; and that significant academic debts and borrowings have been properly acknowledged and referenced.


## 1. Introduction 

This section outlines how to replicate the code and access the required data. All data will be available in a **OneDrive folder**, with the code and environment files (.yml) provided in a dedicated GitHub repository.

This report documents the work conducted for **Lab Assignment 1**, covering challenges from **Labs 1, 2, 3, and 4**. Each lab section includes problem descriptions, methods, and results. Code is supplemented with comments and markdown explanations, with screenshots of outputs (e.g., maps and graphs) included to ensure clarity and facilitate replication. To meet GitHub size limits, output cells have been cleared from the notebook.

All data paths in the code assume files are stored in a folder named "Data." Instructions for setting up the environment and integrating the datasets are provided in the repository's README file.

To replicate this report:
1. Clone the GitHub repository.
2. Download datasets from the OneDrive folder.
3. Follow the README instructions to configure and run the notebook locally.

#### GitHub Repository: 
#### One Drive Folder: 

## 2. Lab No 1: Python Recap (9 Challenges) 


### Challenge 1: 
 Create a table with required columns and rows to represent the type of roads in the UK. The table should include headers and hiperlinks.

### Challenge 2:

In the following cell, create the code to calculate the average of three numbers and print the result. There is no need for functions. Comment your code and the process.

### Challenge 3:

Write a condition to check if a number is even.

### Challenge 4:

Write the python code  to find the factorial of a number using a loop. Thefactorial of a number (integer positive) is the sum of multiplication of all the integers smaller than that positive integer. For example, factorial of **5** is:

5 * 4 * 3 * 2 * 1 which equals to 120. 


### Challenge 5:

Help me to write the python code, I can use to provide the grades based on the marks according to the following table:

| Mark   | Grade |
|--------|-------|
| 81-100 | 20    |
| 61-80  | 15    |
| 41-60  | 10    |
| 20-40  | 5     |

### Challenge 6:

* Write and use a function to convert Fahrenheit to Celsius. Use the input parameter to let the user enter a number in Fahrenheit and then get the correspondant value in Celsius
* Go to https://www.w3schools.com/python/exercise.asp?filename=exercise_functions1 and complete the 6 excersices. Functions are a very important part of Python so we will use extra time to practice that.


### Challenge 7:

Calculate the mean and standard deviation of elements in a NumPy array. The outcome should have something like ``print(f"Mean: {mean_value}, Standard Deviation: {std_deviation}")``

### Challenge 8:

You have the following dataframes 'sales_data' and 'movie_ratings'

* Display the first 5 rows of the DataFrame 'sales_data'
* Use `loc` to filter and display rows where the "Quantity" is greater than 10 and the "Region" is "North".
* Use `iloc` to display the values in the first row and the first three columns.
* Calculate the total sales for each region and display the result.

* Display the last 3 rows of the DataFrame 'movie_ratings'.
* Calculate and display the average rating for each genre.

### Challenge 9:

* Recreate the explore map, but now using one of the columns to create a choropleth map Check this link to get more information https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html

* Get the histogram of the raster you plot in the exercise. Here you can get an example of how to do it: https://rasterio.readthedocs.io/en/latest/topics/plotting.html


## 3. Lab No 2: Data Manipulation and Working with Web Services (2 Challenges)

### Challenge No 1:

1. Using a Dictionary, create a dataframe (table), with at least 4 columns and more than 100 rows. How come you can create this among data from scratch without defining every single row of data? 
2. Using the appropriate method, create a new DataFrame containing only the first 30 rows and the first 3 columns of the original DataFrame. Name this new DataFrame subset_df.
3. Using the appropriate method, filter the rows from the original dataframe where a numerical attribute(column) is greater than a particular numerical value, and find another categorical attribute that is equal to a specific string or text. Name this new DataFrame filtered_df.
4. Check this website https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mean.html and apply the methods, mean, standard deviation, group_by to run fundamental statistical analysis of your created data frame.
5. Make sure you comment on your code and describe how you are manipulating the data.


### Challenge No 2:

**Part No 1:**

1. Using the same workflow previously described, now calculate the clustered areas for the GeoPandasDataFrame `gdf_bikes_end`
2. Make sure you don't have any NaN in your columns, add a CRS, clean up the unnecessary attributes, calculate the cluster values, and plot a map of 4 calculated clusters for the return locations.

**Part No 2:**

1. Using the Glasglow Open Data API ( Transit) https://developer.glasgow.gov.uk/api-details#api=traffic&operation=traffic-sensor-locations fetch all the sensor locations in the city.
2. Map the sensor
3. Find the WorkingZones and Calculate/Map the areas with more and fewer sensors distributed in the city.
4. You will need:
   * Get two separate Geopandas DataFrames, one for the traffic sensors and another one for the WorkingZones.
   * Using `sJoin` ( Spatial Join) https://geopandas.org/en/stable/docs/reference/api/geopandas.sjoin.html
   calculate the overlay of sensors and polygons.
   * Using group_by https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html to count the number of sensors per WorkingZone
   * Make sure you add the counts into the WorkingZone polygons of Glasgow so you can create a map of Zones with more and fewer traffic sensors.
   * Of course, you will need extra steps where you manipulate the data and extract what you need, for instance, clipping the Working Zones only for Glasgow.
5. Make sure you comment on your code and describe how you are manipulating the data.


## 4. Lab No 3: Geovisualization Techniques - Data Viz - Part 1 (4 Challenges)

### Challenge 1

**What happens if you have non-numerical attributes?**

Please extend the `data_description` function to only accept numerical columns and calculate mean and counts. The outcome should be a table with Mean and Counts per Column.

> Tip: Check this function in pandas to filter the numerical values. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html#pandas.DataFrame.select_dtypes
> Here some extra resources of creating function in python https://www.geeksforgeeks.org/python-functions/?ref=lbp


### Challenge 2

Now is your turn to find, read, process and then make a comprehensive descriptive statistics analysis based on the previous resources and others you might need to look at. Create insightful visualizations combining both maps and charts to convey meaningful information about a chosen city.

1. Define a problem within the urban environment and choose a dataset related to urban life or city dynamics. This could include data on crime rates, housing prices, transportation, demographics, or any other urban-related dataset.
2. Get the data ideally using an **API, or web services**. But it's fine if you need to download the data. Describe why you had to use the traditional  method.
3. Work with the data cleaning and pre-processing, check for missing values, convert data types, and perform any other necessary preprocessing steps. You have the code for that in this and previous labs.
4. Use Pandas to calculate descriptive statistics such as `mean`, `median`, `standard deviation`, and other relevant measures. Explore `correlations` between different variables (Include at least one `univariate` and `bivariate` plots) 
6. Create at least one (or more) interactive map to visualize spatial aspects of the data. For example, plot crime rates across different neighbourhoods or visualize housing prices.
7. Complement with additional charts (line charts, bar charts, etc.) to extend the map and highlight key trends or patterns in the data.
8. As always, document well what you are doing and how you use `descriptive statistics and map visualizations` to extract insights from the data.
> You could try explaining any observed patterns, trends, or correlations, whether they are spatial or non-spatial. **Keep in mind** the defined problem and whether your analysis provides the required insights (Your conclusion could be that you need more data or another type of analysis)

### Challenge 3

1. Go to https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95/about_data 
2. Get the data for Motor Vehicle Collisions - Crashes Jan 2024. The dataset contains 2.06 M of records.
3. Use the API endpoint to map the data (e.g. https://data.cityofnewyork.us/resource/h9gi-nx95.json) 
   ![image.png](attachment:8b7103ca-9191-4b67-97c2-0cfb8b8468fc.png)
4. Customize the map by representing the data by `number_of_persons_killed` and `number_of_cyclist_killed`
5. Finally, calculate descriptive statistics for at least two attributes, such as `mean`, `standard deviation`, and other relevant measures 6. Justify/Describe the attribute selection.
7. Plot correlations between the chosen attributes and create `univariate` and/or `multivariate` charts to justify your insights.
   > Please take note that the dataset includes various numerical values. Hence, each student's attribute selection, justification, charts, and maps are expected to vary. 

### Challenge 4

1. You worked with two modern libraries to map big data. Can you describe the differences between working with Longboard and Datashader? Which one provides the most exciting functionality, and how do the outcomes from both of them vary?
2. Find a large dataset with at least 5 million records. Consider open datasets, government datasets, or any dataset of interest to you. Ensure the dataset is in a format that can be easily loaded into a Pandas DataFrame (Parquet file or another format).
3. Define a potential problem or scenario for mapping this dataset.
4. Load the dataset into a Pandas DataFrame and explore its structure. Here, **I advice you! to take a small portion of that rather than work with the entire table.**
5. Identify key variables of interest that could be effectively visualized using Datashader(https://datashader.org/index.html#). (e.g. is fine if the datasets have only locations, but we are aiming for at least one additional variable to represent in the map. 
6. Use the previous steps and the Datashader documentation to implement a `hvplot` Map.
7. Discuss/Write any challenges you have encountered related to the challenges and how you addressed them.
8. As always. Provide clear comments and/or citations in your code, explaining each step of the Datashader implementation (**Note: You don't need to run the Datashader pipeline**)
   
9. **For next week**, create a **four-slide presentation** summarizing the problem, data source, dataset, challenges, map, and insights from visualizing the large dataset. **Two slides for Challenge 2** and the **other two for this challenge**. You can also use the Notebooks as a tool to make your presentation. 

## 5. Lab No 4: Geovisualization Techniques - Part 2, Apps (3 Challenges)

### Challenge 1

Once you can launch the dashboard on your machine

1. Take some time to explore the global and local indicators (e.g. footprint proportion)
2. This work is about Urban network analytics, and the main goal is to provide an extensive database of urban networks that can be used for multiple purposes. For example, study the Density Network (Node and Edges) vs building complexity. Network Density vs. population, and many more correlations across cities.
3. With a critical eye, **select two cities (e.g. Singapore vs Bogotá) and compare two indicators using the tools provided by the dashboard. You can use the linear regression chart to explore them and see how different those  cities are.**
4. The authors of this study propose that urban networks can be utilized for more than just measuring connectivity, mobility patterns or linear movement. Their research focuses on integrating multiple dimensions and allowing users to cross-check them with other factors such as built-in areas, population, and points of interest. Authors claim that this approach offers a more comprehensive approach to urban analytics. **However, with a critical view, what do you think are the top three caveats, pitfalls, or assumptions of this approach?**
5. Address questions 3 and 4 in the following markdown cell. You are welcome to add images (e.g., screenshots) to enrich your reasoning. There is a word limit of 500 words for this.

> Here is another related paper from the same authors you can use to complement your analysis. Yap, W., Stouffs, R. & Biljecki, F. Urbanity: automated modelling and analysis of multidimensional networks in cities. npj Urban Sustain 3, 45 (2023). https://doi.org/10.1038/s42949-023-00125-w   

### Challenge 2

1. Go to this link https://learn.arcgis.com/en/projects/map-and-analyze-the-urban-heat-island-effect/ and follow all the instructions provided. The ETA for this tutorial is 40 mins, but I guess it will take you around 1 hrs if you are not familiar with ArcGIS Online.

>In order to continue with this tutorial, you may require certain credits (the AGOL currency) and software extensions which might not be available in the University Portal. Nevertheless, they contain all the necessary data essential for completing the exercise. Therefore, it is recommended that you go through all the instructions carefully before processing any data.

2. After completing the tutorial, please take a screenshot of your final dashboard and include it in the next markdown cell along with the dashboard link.

![image.png](attachment:7959dbc5-d9c8-4317-abb2-679b60a1a16b.png)


### Challenge 3
 
Now is the time for you to create some choropleth maps. 

1. Go to this portal https://www.spatialdata.gov.scot/geonetwork/srv/eng/catalog.search#/home
2. Get the Scottish Index of Multiple Deprivation (SIMD) 2020 dataset and extract the data only for the city of Edinburgh.
3. Create two static choropleth maps (e.g. `matplotlib`). These maps should represent an attribute you find interesting in the SIMD dataset. Using two different classifier methods, you need to show how the maps appear different even though the data and attributes are the same. Include a clear description of your choice and the difference in the classification method for the attribute chosen (e.g. Plotting histograms with breakpoints(bins). You can find a complete list of classifiers at https://pysal.org/mapclassify/api.html.
4. Finally, create other two interactive maps (e.g. `choropleth_mapbox`) - one for Glasgow and one for Edinburgh - to represent the difference in deprivation for both cities. Pick any of the available attributes.
   > As always include the appropriate descriptions and code comments where you narrate how you are processing the data. And the insights you get from the results.

## 6. Final Remarks (limitations, barriers, and any additional comments)
