Visualization Tool: dash.plotly

# Aufgabe 1a)
Lesen Sie den Datensatz „Aufgabe-1.csv“ ein und untersuchen Sie den Zweck der Daten sowie die
Datenqualität. Identifizieren Sie mögliche Probleme in den Daten und beheben Sie die Fehler, falls
möglich. Dokumentieren Sie Ihr Vorgehen.  

In [None]:
import pandas as pd

In [None]:
#df = pd.read_csv('aufgabe1_ori.csv')

## Clean data

![find error](1.png) 

1. If you just read the data file with `df = pd.read_csv('aufgabe1.csv')` you will get error **"ParserError"**, since the data is somehow not "clean", which we need to deal with.
Concrete error: **ParserError: Error tokenizing data. C error: Expected 89 fields in line 3, saw 90**  
There should be 89 columns but some rows have 90, that's because in third 6th column "Positions Played" there are some players played more than one positions and in the data set use comma for &, e.g. 'CF,ST', but when it was read, this will cause confusion, because comma is to separate column.
What we can do is that replace , which is within '' with ;. We can use regex to identify this case, e.g. 'A,B', so we want 'A;B'

In [None]:
import re # regex
import pandas as pd

# Specify input and output file paths
input_file = 'aufgabe1_ori.csv'
output_file = 'aufgabe1_clean.csv'

# Read the CSV file line by line and replace commas within single quotes
lines = []
with open(input_file, 'r') as file:
    for line in file:
        # Use regex to find commas within single quotes and replace them with a semicolon
        modified_line = re.sub(r"'(.*?)'", lambda x: x.group(0).replace(',', ';'), line)
        lines.append(modified_line)

# Write the modified lines to a new CSV file
with open(output_file, 'w') as file:
    file.writelines(lines)

# Now read the cleaned data with pandas
# df = pd.read_csv(output_file)

2. Again Error --> ParserError: Error tokenizing data. C error: Expected 89 fields in line 12922, saw 90
![find error](2.png)
The reason is that the expression in german way: 1100000,00€. The comma here will also be translated to the separation of column. Because this is the sole case, so I'd like to change this manuelly in .csv file, e.g. 1100000,00€ --> 1100000. **Change it in aufgabe1_ori.csv!!!**

In [None]:
# Now read the cleaned data with pandas
df = pd.read_csv(output_file)

3. DtypeWarning: Columns (4) have mixed types

In [None]:
# get data types of all columns
print(df.dtypes)

In [None]:
print(df.columns)

In [None]:
# Check data types in column 4
unique_types = df['Potential'].map(type).value_counts()
print(unique_types)

## Zweck der Daten

In [None]:
print(df.head())

In [None]:
print(df.tail())

In [None]:
print(df.info())

In [None]:
print(df.describe())

This data set shows: (to be complete...)
- Information over soccer player, including basic personal information, analysis of skills, values etc.
- 

# Aufgabe 1b)-1d)
b) Visualisieren Sie die Daten aus a) mittels einer interaktiven Applikation in Python, indem Sie die Verteilungen darstellen:  
- Verteilung der Items ???
- Gegenüberstellung der Attribute Age und Wage (in Euro)
- Gegenüberstellung der Attribute Age und Overall  


c) Bauen Sie bei allen Darstellungen Interaktionsmöglichkeiten zum Filtern der Daten ein. Sollten die Daten Ausreißer haben, passen Sie Ihre Visualisierung dahingehend an. Bauen Sie zudem Filtermöglichkeiten zum Filtern nach den Attribute Nationality und Club ein.  
d) Erstellen Sie eine Darstellung zum Vergleich von Datenpunkten, bsp. Zeile 5 mit Zeile 35 des Datensatz. Die zuvergleichenden Datenpunkte sollen interaktiv wählbar sein. 

## "Excel" App

In [None]:
# Import packages
from dash import Dash, html, dash_table

# Incorporate data
df = pd.read_csv('aufgabe1_clean.csv')

# Initialize the app
app = Dash()

# App layout
app.layout = [
    html.Div(children='My First App with Data'),
    dash_table.DataTable(data=df.to_dict('records'), page_size=20)
]

# Run the app
if __name__ == '__main__':
    app.run(debug=True)


## Aufgabe App

In [None]:
import pandas as pd
from dash import Dash, html, dcc, Input, Output
import plotly.express as px
import dash_bootstrap_components as dbc
from dash_bootstrap_templates import load_figure_template


# read data from the csv file
df = pd.read_csv("aufgabe1_clean.csv")

# Initialize dash app with bootstrap theme
load_figure_template('morph')
app = Dash(__name__, external_stylesheets=[dbc.themes.MORPH])

min_year = 1
max_year = 10
team_states = "j"

# Layout, HTML Components
app.layout = html.Div([
    
    # Dashboard
    html.Div([
        html.H1(f'IVDA Praktikum Aufgabe 1', className='text-center pb-3'),
        
        html.Div([
            # firt section: verteilung der items
            html.Div([
                html.H3('Verteilung der Items')
            ], className='col-12 col-xl-6 p-3'),
            # second section: correlation between Wage and Age, Overall and Age
            html.Div([
                html.H3('Correlation'),
                html.Div
            ],className='col-12 col-xl-6 p-3')
        ], className = 'row'),

        # third section: Vergleich zwischen Players
        html.Div([
            html.H3('Vergleich')
        ])
    ], className = 'container-fluid'),

    # Footer
    html.Footer([
        html.Div([
            html.A('Author: Huo Jiang, Tina', href='https://github.com'),
            html.Span(' | '),
            html.A('Copyright', href='https://www.uni-leipzig.de')
        ], className='bg-light text-dark text-center py-3 fs-5')
    ])
])
if __name__ == '__main__':
    app.run_server(debug=True)


Columns (4) have mixed types. Specify dtype option on import or set low_memory=False.

