---
---

# WELCOME TO PYTHON COURSE (24.09)

---
---

# STUDENTS QUESTIONS ANSWERED

---
---

# Results STARTUP (-> 10 Uhr)

---
---

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data
data = {
    'EmployeeID': [1, 2, 3, 4, 5, 6, 7, 8, 9],
    'FirstName': ['John', 'Jane', 'Michael', 'Emily', 'David', 'Alice', 'Robert', 'Laura', 'James'],
    'LastName': ['Doe', 'Smith', 'Johnson', 'Williams', 'Brown', 'Davis', 'Wilson', 'Moore', 'Taylor'],
    'Role': ['Software Engineer', 'Data Scientist', 'UX Designer', 'Product Manager', 'QA Engineer', 'DevOps Engineer', 'Backend Developer', 'Frontend Developer', 'HR Manager'],
    'Department': ['Development', 'Data Science', 'Design', 'Management', 'Quality Assurance', 'Operations', 'Development', 'Development', 'Human Resources'],
    'Salary': [70000, 75000, 68000, 80000, 65000, 72000, 71000, 69000, 73000],
    'StartDate': ['2022-01-15', '2021-06-01', '2023-03-10', '2020-11-01', '2022-07-15', '2021-09-20', '2023-02-05', '2022-10-25', '2020-12-01'],
    'Project': ['Project Alpha', 'Project Beta', 'Project Gamma', 'Project Delta', 'Project Epsilon', 'Project Zeta', 'Project Alpha', 'Project Beta', 'NA'],
    'PerformanceRating': [4.5, 4.7, 4.2, 4.6, 4.1, 4.3, 4.4, 4.0, 4.8],
    'Age': [28, 32, 26, 35, 29, 31, 27, 25, 38],
    'Education': ["Bachelor's", 'PhD', "Master's", 'MBA', "Bachelor's", "Master's", "Bachelor's", "Bachelor's", "Master's"],
    'YearsOfExperience': [5, 7, 3, 10, 4, 6, 4, 2, 12],
    'Bonuses': [3500, 4000, 2800, 5000, 2500, 3200, 3000, 2200, 4500],
    'WorkHoursPerWeek': [40, 42, 38, 45, 40, 41, 40, 39, 42],
    'VacationDaysTaken': [10, 15, 8, 12, 7, 11, 9, 6, 14],
    'TrainingHours': [30, 45, 25, 40, 20, 35, 28, 22, 30],
    'TeamSize': [6, 4, 5, 8, 6, 5, 6, 5, 4],
    'ClientSatisfactionScore': [4.2, 4.5, 4.0, 4.7, 3.9, 4.3, 4.1, 3.8, 4.6]
}

df = pd.DataFrame(data)

# 1. Data Cleaning and Preprocessing
print("1. Data Cleaning and Preprocessing")

# Convert StartDate to datetime
df['StartDate'] = pd.to_datetime(df['StartDate'])

# Check for missing values
print("Missing values:")
print(df.isnull().sum())

# Handle outliers in Salary using IQR method
Q1 = df['Salary'].quantile(0.25)
Q3 = df['Salary'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
df['Salary'] = df['Salary'].clip(lower_bound, upper_bound)

print("\nSalary range after handling outliers:")
print(df['Salary'].describe())

# 2. Data Transformation
print("\n2. Data Transformation")

# Create a new column 'TenureYears'
df['TenureYears'] = (pd.Timestamp.now() - df['StartDate']).astype('<m8[Y]')

# Scale Salary and Bonuses using pandas
df['Salary_Scaled'] = (df['Salary'] - df['Salary'].min()) / (df['Salary'].max() - df['Salary'].min())
df['Bonuses_Scaled'] = (df['Bonuses'] - df['Bonuses'].min()) / (df['Bonuses'].max() - df['Bonuses'].min())

print("New columns added: TenureYears, Salary_Scaled, Bonuses_Scaled")
print(df[['TenureYears', 'Salary_Scaled', 'Bonuses_Scaled']].head())

# 3. Filtering and Sub-setting
print("\n3. Filtering and Sub-setting")

# Subset of employees in Development department
dev_employees = df[df['Department'] == 'Development']
print("Employees in Development department:")
print(dev_employees[['FirstName', 'LastName', 'Role']])

# Filter employees with above-average performance
above_avg_performance = df[df['PerformanceRating'] > df['PerformanceRating'].mean()]
print("\nEmployees with above-average performance:")
print(above_avg_performance[['FirstName', 'LastName', 'PerformanceRating']])

# 4. Data Aggregation
print("\n4. Data Aggregation")

# Average salary by department
avg_salary_by_dept = df.groupby('Department')['Salary'].mean().sort_values(ascending=False)
print("Average salary by department:")
print(avg_salary_by_dept)

# Performance rating statistics by role
perf_stats_by_role = df.groupby('Role')['PerformanceRating'].agg(['mean', 'min', 'max'])
print("\nPerformance rating statistics by role:")
print(perf_stats_by_role)

# 5. Data Joining and Merging
print("\n5. Data Joining and Merging")

# Create a separate DataFrame with department budget info
dept_budget = pd.DataFrame({
    'Department': ['Development', 'Data Science', 'Design', 'Management', 'Quality Assurance', 'Operations', 'Human Resources'],
    'Budget': [500000, 400000, 300000, 450000, 250000, 350000, 200000]
})

# Merge with the main DataFrame
df_with_budget = pd.merge(df, dept_budget, on='Department', how='left')
print("Merged DataFrame with department budget:")
print(df_with_budget[['FirstName', 'LastName', 'Department', 'Salary', 'Budget']].head())

# 6. Pivoting and Reshaping
print("\n6. Pivoting and Reshaping")

# Pivot table: Average performance rating for each role in each department
pivot_perf = pd.pivot_table(df, values='PerformanceRating', index='Department', columns='Role', aggfunc='mean')
print("Pivot table - Average performance rating by role and department:")
print(pivot_perf)

# 7. Data Imputation (for demonstration, let's assume some missing values in TrainingHours)
print("\n7. Data Imputation")

# Introduce some missing values in TrainingHours
df.loc[df.sample(n=3).index, 'TrainingHours'] = None

# Impute missing values with mean using pandas
df['TrainingHours'] = df['TrainingHours'].fillna(df['TrainingHours'].mean())

print("TrainingHours after imputation:")
print(df['TrainingHours'])

# 8. Data Visualization
print("\n8. Data Visualization")

# Scatter plot: Years of Experience vs Salary
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='YearsOfExperience', y='Salary', hue='Department')
plt.title('Years of Experience vs Salary')
plt.savefig('experience_vs_salary.png')
plt.close()

# Bar plot: Average Performance Rating by Department
plt.figure(figsize=(10, 6))
df.groupby('Department')['PerformanceRating'].mean().sort_values().plot(kind='bar')
plt.title('Average Performance Rating by Department')
plt.tight_layout()
plt.savefig('avg_performance_by_dept.png')
plt.close()

# Heatmap: Correlation matrix
plt.figure(figsize=(12, 10))
corr_matrix = df.select_dtypes(include=[int, float]).corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.tight_layout()
plt.savefig('correlation_heatmap.png')
plt.close()

print("Visualizations saved as PNG files.")

# 9. Summary Statistics
print("\n9. Summary Statistics")
print(df.describe())

print("\nAnalysis complete. Check the generated PNG files for visualizations.")

---
---

# TEST SEABORN (10.15 -> 11 Uhr) 

---

#### Question 1  
**How can you visualize the residuals of a linear regression model using Seaborn?**  
- a) `sns.scatterplot()`  
- b) `sns.regplot()`  
- c) `sns.residplot()`  
- d) `sns.lineplot()`

#### Question 2  
**Which Seaborn function allows you to plot the relationship between multiple pairs of variables, while also showing the distribution of each variable along the diagonal?**  
- a) `sns.pairplot()`  
- b) `sns.jointplot()`  
- c) `sns.catplot()`  
- d) `sns.heatmap()`

#### Question 3  
**How can you control the number of bins in a histogram when using `sns.histplot()`?**  
- a) By specifying the `binwidth` parameter  
- b) By setting the `bins` parameter  
- c) By using `sns.distplot()` instead  
- d) By changing the `kde` parameter

#### Question 4  
**What does setting `hue` in a Seaborn plot do?**  
- a) Changes the color palette  
- b) Adds additional axes  
- c) Adds a grouping variable for color encoding  
- d) Adjusts the transparency of the plot

#### Question 5  
**What function would you use to plot a grid of subplots based on combinations of categorical variables, such as gender and region, for different plots like scatterplots or bar plots?**  
- a) `sns.FacetGrid()`  
- b) `sns.catplot()`  
- c) `sns.lmplot()`  
- d) `sns.violinplot()`

#### Question 6  
**How would you create a plot that combines a scatterplot with marginal histograms or KDE plots in Seaborn?**  
- a) `sns.pairplot()`  
- b) `sns.jointplot()`  
- c) `sns.catplot()`  
- d) `sns.heatmap()`

#### Question 7  
**How can you customize the axis limits for Seaborn plots?**  
- a) Using `sns.set_axis_limits()`  
- b) Using `plt.xlim()` and `plt.ylim()` from Matplotlib  
- c) Using `sns.set_style()`  
- d) Using `sns.set_context()`

#### Question 8  
**What does `sns.violinplot()` visualize, and how is it different from `sns.boxplot()`?**  
- a) Violin plots visualize the distribution of data and density, while boxplots summarize quartiles and outliers  
- b) Violin plots are only used for categorical data, while boxplots are for numerical data  
- c) Violin plots show exact data points, boxplots don't  
- d) Violin plots are used for time series data, boxplots are not

#### Question 9  
**How can you add a regression line to a scatterplot with Seaborn, while also showing confidence intervals for the regression line?**  
- a) `sns.scatterplot()`  
- b) `sns.regplot()`  
- c) `sns.lmplot()`  
- d) `sns.jointplot()`

#### Question 10  
**How can you create a multi-panel grid of plots using Seaborn's `FacetGrid()` and map different plots onto different axes?**  
- a) Using the `map()` function  
- b) Using `pairplot()`  
- c) Using `set_context()`  
- d) Using `subplots()` from Matplotlib



# PLOTLY (-> 11.30)

---


Plotly is a widely-used open-source library for creating interactive, high-quality visualizations in Python. It is particularly valuable for data scientists, analysts, and researchers who need to visualize complex datasets in an engaging and interactive way. Plotly supports both static and dynamic visualizations and can be embedded in web applications, making it a powerful tool for enhancing presentations and reports.


## **3. Key Features**

##### **3.1. Interactive Charts**
One of Plotly's standout features is its ability to generate highly interactive charts. Users can zoom in, pan across data, click on elements for more information, and interact with the chart to gain deeper insights. This level of interactivity makes data exploration intuitive and efficient.

##### **3.2. Variety of Chart Types**
Plotly provides support for a wide range of chart types, covering both common and advanced visualizations. Some of the key chart types include:

- **Scatter Plots:** Used to display relationships between two variables on a two-dimensional grid.
- **Line Charts:** Ideal for visualizing trends over time or continuous data.
- **Bar Charts:** Useful for comparing categories or groups.
- **Pie Charts:** A simple way to show proportions within a dataset.
- **3D Plots:** Visualize data in three dimensions, especially useful for multivariate datasets.
- **Heatmaps:** Show intensity of values in a matrix, ideal for correlation matrices or any kind of density plot.
- **Histograms:** Summarize the distribution of numerical data.

These diverse chart types make Plotly a versatile tool for both exploratory data analysis and communication of findings.

##### **3.3. Customization and Layout Options**
Plotly is highly customizable, allowing users to fine-tune visual aspects for each chart. Some of the customization options include:

- **Axes Configuration:** You can modify axis labels, scales (logarithmic, linear), and add grid lines to enhance readability.
- **Colors and Markers:** Choose from a wide array of color palettes and customize markers for data points, making your charts both aesthetically pleasing and informative.
- **Titles, Legends, and Labels:** Add and adjust titles, legends, and labels to clearly communicate the story behind your data. These elements can be positioned and styled according to your preferences.

##### **3.4. Data Sources and Formats**
Plotly is highly compatible with various data formats, including:

- **Pandas DataFrames:** A powerful data structure in Python commonly used for data analysis.
- **NumPy Arrays:** Essential for mathematical computations and array manipulations.
- **CSV and JSON Files:** Plotly can read directly from these file formats, enabling quick and easy data visualization.
- **Databases:** With proper configuration, Plotly can pull data directly from SQL databases.

## **4. Plotly Express**

Plotly Express is a simplified interface for generating visualizations with minimal lines of code. It is designed to streamline the creation of common chart types, while still offering powerful customization options. Plotly Express wraps Plotly's underlying graph objects and automatically handles much of the data processing and configuration.

For example, functions like `px.scatter()` and `px.line()` allow users to create scatter plots and line charts with just a few commands, drastically speeding up the workflow for rapid prototyping and exploratory analysis.

## **5. Integration and Export Options**

##### **5.1. Integration with Web Applications**
Plotly's charts can easily be embedded into web applications using frameworks like Flask or Django. This makes it ideal for building interactive dashboards where users can explore data dynamically.

##### **5.2. Jupyter Notebooks**
Plotly seamlessly integrates with Jupyter Notebooks, one of the most popular tools for data analysis. It allows you to display interactive visualizations directly within the notebook interface, providing an enriched analysis experience.

##### **5.3. Exporting Charts**
Plotly charts can be exported in various formats, making them highly adaptable to different reporting needs. You can export charts as:

- **PNG or JPEG:** High-resolution static images for reports or publications.
- **SVG:** Scalable vector graphics that maintain quality at any size.
- **PDF:** For easily sharing charts in print-friendly formats.
- **HTML:** Fully interactive charts that can be embedded in websites or shared as standalone files.

These export options ensure that your visualizations can be shared and reused in multiple contexts.

## **6. Advanced Features**

##### **6.1. Animation Support**
Plotly allows you to add animations to charts, making it possible to visualize how data changes over time. This is especially useful for time-series data and dynamic trends.

##### **6.2. Subplots and Faceting**
Plotly provides functions for creating subplots or facet plots, where multiple charts are displayed within a single figure. This is useful for comparing different variables or groups side-by-side.

##### **6.3. Statistical Charts**
In addition to standard visualizations, Plotly also supports statistical chart types such as:

- **Box Plots:** Show distribution based on quartiles.
- **Violin Plots:** Display the distribution of data across multiple variables.

## **7. Resources and Learning**

- **Official Documentation:** The official Plotly documentation provides comprehensive tutorials, examples, and reference guides for all features.
- **Plotly Dash:** Plotly's `Dash` framework allows users to create full-fledged analytical web applications with just Python, simplifying the development of data dashboards.
- **Community and Support:** Plotly has an active community and support forums where users can seek help, share ideas, and find inspiration for their projects.


---
---

# TODAYS TASKS: YOUR OWN STARTUP ANALYSIS WITH DIFFERENT ANALYSIS AND PLOTLY USAGE

---
