### Tutorial 3 - Data Manipulation and Visualization

Packages to install: 
* `pandas`
* `matplotlib`
* `plotly`

### Question 1

Based on the data below, do the following:  

```
# Sample data for df1
data1 = {
    'user_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
}

# Sample data for df2
data2 = {
    'user_id': [2, 3, 4],
    'age': [25, 30, 35]
}
```
    
1. Import necessary libraries
2. Load the data into a dataframe
3. Concatenate the dataframes (vertically)
4. Concatenate the dataframes (horizontally)
5. Merge the dataframes on the column `user_id` using inner join.
6. Join the dataframes with `user_id` as the index. This time, do an outer join.

In [None]:
import pandas as pd
import numpy as np

# Sample data for df1
data1 = {
    'user_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
}

# Sample data for df2
data2 = {
    'user_id': [2, 3, 4],
    'age': [25, 30, 35]
}

### Question 2

Based on the data below, do the following:  

```
grades = {
    'StudentID': ['S1', 'S2', 'S3'],
    'Math': [88, 75, 92],
    'Science': [84, 88, 93]
}

more_grades = {
    'StudentID': ['S2', 'S3', 'S4'],
    'Literature': [78, 85, 88],
    'Art': [82, 90, 85]
}

activities = {
    'StudentID': ['S1', 'S2', 'S4'],
    'Club': ['Robotics', 'Math Club', 'Art Club']
}
```
    
1. Import necessary libraries
2. Load the data into a dataframe
3. Append `grades` and `more_grades` row-wise.
4. Merge `grades` and `activities` on `StudentID` to see grades and club participation.
5. Use join() to merge `grades` and `activities` based on `StudentID` column as index.

In [None]:
import pandas as pd

grades = pd.DataFrame({
    'StudentID': ['S1', 'S2', 'S3'],
    'Math': [88, 75, 92],
    'Science': [84, 88, 93]
})

more_grades = pd.DataFrame({
    'StudentID': ['S2', 'S3', 'S4'],
    'Literature': [78, 85, 88],
    'Art': [82, 90, 85]
})

activities = pd.DataFrame({
    'StudentID': ['S1', 'S2', 'S4'],
    'Club': ['Robotics', 'Math Club', 'Art Club']
})

In [None]:
appended = pd.concat([grades, more_grades])
appended

In [None]:
merge1 = pd.merge(grades, activities, on = 'StudentID', how = 'inner')
merge1

In [None]:
new_grades = grades.set_index('StudentID')
new_activities = activities.set_index('StudentID')

In [None]:
new_grades.join(new_activities)

### Question 3

Based on the data below, do the following:  
```
months = range(1, 13)
sales = [142, 150, 157, 169, 200, 210, 218, 230, 241, 260, 275, 291]
profits = [25, 28, 34, 31, 40, 45, 48, 50, 54, 60, 65, 70]
```
1. Generate a plot to see the trend of sales over the year
2. Compare the sales data with the profit data over the same period.
3. Create a bar chart to visualize the comparison between sales and profits more clearly.

### Question 4

The dataset consists of monthly Air Quality Index (AQI) values for four cities over one year, structured as follows:

Months of the year (January to December):  
AQI for City A  
AQI for City B   
AQI for City C  
AQI for City D  

Based on the data below:

```
import numpy as np

months = np.array(["January", "February", "March", "April", "May", "June", 
                   "July", "August", "September", "October", "November", "December"])
aqi_city_a = np.random.randint(50, 150, size=12)
aqi_city_b = np.random.randint(60, 160, size=12)
aqi_city_c = np.random.randint(40, 140, size=12)
aqi_city_d = np.random.randint(70, 170, size=12)
```

1. Plot a line graph for comparison of AQI trends across all cities
2. Generate a scatter plot to compare AQI values of City A vs City B
3. Plot a histogram of AQI for City A
4. Create a Bar Chart for comparing the average AQI across cities. The bar chart must be in sky blue color.
5. Place all the graphs in a (2 x 2) canvas figure.
6. Save the figure as "combine.png"

Note: Make sure figsize of the image is (10,6)  
Note: Use ggplot style template

In [None]:
import numpy as np

months = np.array(["January", "February", "March", "April", "May", "June", 
                   "July", "August", "September", "October", "November", "December"])
aqi_city_a = np.random.randint(50, 150, size=12)
aqi_city_b = np.random.randint(60, 160, size=12)
aqi_city_c = np.random.randint(40, 140, size=12)
aqi_city_d = np.random.randint(70, 170, size=12)

In [19]:
import matplotlib as plt

fig1 = plt.figure(figsize=(20,20))

plt.plot(months, aqi_city_a, marker='x', linestyle='-', colour='pink', label='city a')

AttributeError: module 'matplotlib' has no attribute 'figure'

### Question 5

#### Ploty-Express (3D visuals only)

The dataset will simulate geographical and environmental data for a set of locations. Each location has the following attributes:

Longitude (X)  
Latitude (Y)  
Elevation (Z)  
Temperature (Color Scale)  

Use the code below to generate the synthetic dataset:

```
import pandas as pd
import numpy as np

# Generate a dataset
np.random.seed(42) # Ensure reproducibility
longitude = np.random.uniform(-180, 180, 100)
latitude = np.random.uniform(-90, 90, 100)
elevation = np.random.uniform(0, 8848, 100) # Max height to Mount Everest
temperature = np.random.uniform(-50, 50, 100) # Temperature range

df = pd.DataFrame({
    'Longitude': longitude,
    'Latitude': latitude,
    'Elevation': elevation,
    'Temperature': temperature
})
```

Based on the dataset, perform the following operations:  

1. Plot geographical points with their elevation and use temperature to color-code the points. Use `scatter_3d`.
2. A 3D line chart could be used to represent a path or journey. Simulate a path by sorting the data by elevation and plotting the result. Use `line_3d`.
3. 3D surface plots are useful for representing topographical surfaces or any kind of 3D surface data.  
   Create a simplified surface plot simulating a mountain range.  Use the `plotly.graph_objects`.  Reference can get from here: https://plotly.com/python/3d-surface-plots/#topographical-3d-surface-plot  
   For the purposes of this example, we'll generate a structured grid of points and compute a hypothetical elevation.
   Use the data below for the third task:

```
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))
```

In [14]:
import pandas as pd
import numpy as np

# Generate a dataset
np.random.seed(42) # Ensure reproducibility
longitude = np.random.uniform(-180, 180, 100)
latitude = np.random.uniform(-90, 90, 100)
elevation = np.random.uniform(0, 8848, 100) # Max height to Mount Everest
temperature = np.random.uniform(-50, 50, 100) # Temperature range

df = pd.DataFrame({
    'Longitude': longitude,
    'Latitude': latitude,
    'Elevation': elevation,
    'Temperature': temperature
})

In [None]:
# Task 1: 3D Scatter Plot
import plotly.express as px
import plotly.io as pio
fig1 = px.scatter_3d(df, 
                    x='Longitude', 
                    y='Latitude', 
                    z='Elevation',
                    color='Temperature',
                    color_continuous_scale='Viridis',
                    title='Geographical Points with Elevation (Color: Temperature)')

fig1.update_layout(scene=dict(
                  xaxis_title='Longitude',
                  yaxis_title='Latitude',
                  zaxis_title='Elevation'),
                  width=900, height=700)

pio.show(fig1)


In [None]:
# Task 2: 3D Line Chart
# Sort the data by elevation to create a path

df_sorted = df.sort_values('Elevation')

fig2 = pl.line_3d(df_sorted,
                x='Longitude',
                y='Latitude',
                z='Elevation',
                color='Temperature',
                color_continuous_scale='Plasma',
                title='Path Simulation (sorted by Elevation)')

fig2.update_layout(scene=dict(
                  xaxis_title='Longitude',
                  yaxis_title='Latitude',
                  zaxis_title='Elevation'),
                  width=900, height=700)

fig2.show()

In [None]:
# Task 3: 3D Surface Plot
# Using the data provided for this task
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))

# Create 3D surface plot
fig3 = pl.Figure(data=[pl.Surface(z=z, x=x, y=y, colorscale='Viridis')])

fig3.update_layout(title='3D Surface Plot - Simulated Mountain Range',
                  scene=dict(
                      xaxis_title='X',
                      yaxis_title='Y',
                      zaxis_title='Elevation'),
                  width=900, height=700)

fig3.show()