[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/stefanlessmann/ESMT_IML/blob/main/notebooks/p1_introduction_and_GenAI_demo.ipynb)

# Practical 1: Unsupervised learning and AI Peer-Programming
Our first lecture introduced us to the different types of machine learning, their use cases, and underlying data structures. Given that the course focuses on supervised learning, we use this introductory practice session and shed, at least, some light on *unsupervised learning*.

Further, we use this notebook to illustrate the capabilities of Generative AI (GenAI). More specifically, we examine how GenAI (e.g., ChatGPT), can help us develop Python codes. To that end, the notebook provides a set of prompts to generate Python codes. Your task is to try out these prompts using an AI of your choice and experiment with the generated programming codes. 

We suggest you begin with the prepared prompts. Afterwards, you are most welcome to make adjustments and examine how changes in your prompts change the generated codes; and by extension the effectiveness of the GenAI support. 

**Disclaimer** Prompts were tested with different versions of ChatGTP and should work reasonably well. That said, there is no guarantee that the provided prompts lead an AI to produce ready-to-use code. As said, one learning goal of this session is to *study how GenAI can help us*. We should not expect it to do all the work.

Let's move on with the first prompt.

## Prompt 1: Cluster analysis
Generate a Python script that demonstrates unsupervised machine learning using cluster analysis. Your script should perform the following tasks:
1. Generate a synthetic data set. To facilitate visualization, restrict the data to two features. The synthetic data points should stem from different clusters, to ensure suitability of the data for a clustering demo. 
2. Visualize the data using a scatter plot. Use different symbols to distinguish data points from different clusters.
3. Demonstrate how to run a clustering algorithm on the data
4. Visualize the output of the clustering algorithm in a second plot. This plot should depict, the true cluster membership of each data point and to which cluster the data point was assigned by the clustering algorithm. Users shall easily see whether the algorithm assigned data points to the correct cluster.  

Make sure the code is ready to be executed. For example, import all necessary libraries. Also make sure to annotate the code using comments for better comprehensibility. Also, the code should allow users to easily adjust the difficulty of the clustering task.



#### Copy the generated codes into the below code cell and execute it:

In [None]:
# Copy and run generated codes here

#### Code inspection
As said, the above prompt should work reasonably well, meaning that, at this point, you should have codes for a clustering demo and the corresponding results in front of you.

 The above assumed, note how our prompt explicitly asked for a means to control the difficulty of the clustering task. Review the generated codes and find out how you can achieve this. Adjust the codes to increase the clustering task complexity and rerun them to verify everything worked out as expected. Feel free to repeat this exercise multiple time with various levels of task complexity. This should give you a good understanding of when the clustering algorithm works well and when it fails.

#### Code revision
It is safe to assume that the generated codes were not perfect. Did you spot any issue you think warrants improvement? 
Go back to your AI and try to make it generate better code. You can either revise the prompt or continue the 'discussion' with the AI and ask it to make improvements.  

In case you did not spot any issue, you could feed the generated codes back into the AI and task it to suggest improvements. To do that, you could start your prompt like so:

*Below is a snippet of Python code aimed at demonstrating unsupervised learning using clustering. Review the code and make suggestions for improvement.*
``` Python
{Copy the generated codes into the prompt}
```


In [35]:
# Space to copy improved AI-generated codes to demonstrate clustering


## Prompt 2: Dimensionality reduction
The lecture also introduced another form of unsupervised learning, namely dimensionality reduction. With our second prepared prompt, we try to generate code that demonstrates how to perform dimensionality reduction in Python.

**Suggested prompt:**


Write a Python script that demonstrates unsupervised learning in the form of dimensionality reduction. Specifically, your script should perform the following tasks:

1. Generate a synthetic, high-dimensional data set. The number of features should be a parameter that users can control easily. The data should  comprise a user-defined number of clusters.

2. Demonstrate the application of an algorithm for dimensionality reduction. Select a suitable algorithm. Your algorithm should output a two-dimensional projection of the original data to facilitate visualization.

3. Visualize the two-dimensional data set by means of a scatter plot. 

Make sure the generated code is ready to be executed by importing all relevant libraries.  Annotate your code for better readability.

In [39]:
# Copy and execute generated code here:



## Programming task:
If the prepared prompt worked as intended, your synthetic high-dimensional data set should comprise clusters. The 2D projection of the data should also show these clusters. Let's see whether we can also find these clusters in the original, high-dimensional data using a clustering algorithm.

Specifically, drawing on the code generated in response to our first prompt, your task is to write a Python script that clusters your high-dimensional data set. 

We suggest you try to solve this task without the help of Gen-AI. 

*Additional task for the experts:* try to find a way to verify that the algorithm found the correct number of clusters. 

In [40]:
# Your solution goes here