
**Extract some summary statistics of the money spent by the Senat of Berlin**

a function that takes the data frame of spendings and returns a list with

- the count
- the mean
- the standard deviation
- the minimum
- the 25% percentile
- the 50% percentile (median)
- the 75% percentile
- the maximum
 



In [1]:
import numpy as np
import pandas as pd

df = pd.read_csv("data/zuwendungen-berlin.csv.gz")

def assignment_01(df):
    """
    This functions takes a Dataframe as Parameters from Zuwendungen-Berlin CSV dataset 
    following the column Betrag assings to a list which is called spendings_list and with the describe method
    the statistics summary of the list will be return.
    
    Input:
        - df: DataFrame, the input DataFrame from the 'Zuwendungen-Berlin' CSV dataset.
    
    Output:
        - spendings_summary: DataFrame, a statistical summary of the spending amounts.
    """
    
    # Your code here
    # Extract the 'Betrag' column and assign it to a list called 'spendings_list'
    spendings_list = df['Betrag']
    # Use the describe method to generate a statistical summary of the spending amounts
    return spendings_list.describe() 

def assignment_01_test():
    spending_statistics = np.array(
        [
            4.08200000e04,
            2.29215965e05,
            3.93196343e06,
            1.00000000e02,
            4.67300000e03,
            1.64770000e04,
            6.11755000e04,
            4.87261162e08,
        ]
    )
    print(assignment_01(df) - spending_statistics)
    assert np.allclose(assignment_01(df), spending_statistics)


assignment_01_test()

count    0.000000
mean    -0.000032
std     -0.001190
min      0.000000
25%      0.000000
50%      0.000000
75%      0.000000
max      0.000000
Name: Betrag, dtype: float64




**How much is each recipient of a spending receiving in total?**

Write a function ``assignment_02`` that takes the data frame of spendings and groups by recipient (column ``'Name'``) and then sums all money received for each recipient. Return the names of the recipients that received in total 143 Euros. 



In [2]:
def assignment_02(df):
    # Your code here
    """
    This function takes a DataFrame as a parameter. It first selects two columns ('Name' and 'Betrag') from the
    DataFrame and assigns them to a new DataFrame. Then, it calculates the sum of 'Betrag' for each unique 'Name'
    using the groupby and sum methods, creating a Series. Finally, it returns the names of recipients whose total
    sum of 'Betrag' equals 143.
    Input:
        - df: DataFrame, the input DataFrame containing 'Name' and 'Betrag' columns.
    
    Output:
        - result: Index, the index (names) of recipients whose total sum of 'Betrag' equals 143.
    """
    
    df_new = df[['Name','Betrag']] # Select only the 'Name' and 'Betrag' columns
    series_filter = df_new.groupby('Name')['Betrag'].sum()# Group by 'Name' and calculate the sum of 'Betrag' for each recipient
    result = series_filter[series_filter == 143 ]# Filter recipients with a total of 143 Euros
    return result.index

def assignment_02_test():
    result = sorted(assignment_02(df))
    assert (result[0] == "Rock 'n' Roll Club Pinguin Berlin e. V.") & (
        result[1] == "Triathlongemeinschaft Sisu Berlin e. V."
    )


assignment_02_test()



**How much is Berlin spending on each political ressort?**

Write a function ``assignment_03`` that takes the data frame of spendings (spending is the column 'Betrag'), groups by political ressort (in german 'Politikbereich') and computes the 

 - minimum
 - median
 - maximum

of the spendings on each political ressort. Return the aggregates in the political ressort ('Politikbereich') 'sciences' ('Wissenschaft')




In [3]:
def assignment_03(df):
    # Your code here
    """
    This function takes a DataFrame of spendings as a parameter. It groups the DataFrame by the 'Politikbereich' 
    column and calculates the minimum, median, and maximum spendings for each political ressort. The function 
    returns a DataFrame containing the aggregates for the 'sciences' ('Wissenschaft') political ressort.

    Input:
        - df: DataFrame, the input DataFrame containing spendings and 'Politikbereich' column.

    Output:
        - sciences_aggregates: DataFrame, aggregates (minimum, median, maximum) for the 'sciences' political ressort.
    """
 
    # Select relevant columns
    df_sciences = df[['Politikbereich', 'Betrag']]
    # Group by 'Politikbereich' and calculate minimum, median, and maximum
    aggregates = df_sciences.groupby('Politikbereich')['Betrag'].agg(['min', 'median', 'max'])
    
    # aggregates.filter(like='Wissenschaft', axis=0).values[0] [[5.0000000e+02, 1.1555750e+05, 4.1852102e+07]]
    # aggregates[aggregates.index == 'Wissenschaft'].values[0]
    return aggregates.loc['Wissenschaft'].values 


def assignment_03_test():
    correct = np.array([500.0, 115557.5, 41852102.0])
    assert np.array_equal(assignment_03(df), correct)


assignment_03_test()



**How much is Berlin spending on each U-Bahn?**

Write a function ``assignment_04`` that takes the data frame of spendings, filters for transportation (german 'Verkehr'), groups by the specific ubahn and sums up the spendings. For the ubahn grouping you can extract the ubahn with the regular expression ``'U[1-9]'``. The function should return the ubahn names ordered from most (first element) to least expensive (last element).




In [4]:
def assignment_04(df):
    # Your code here
    """
    This function takes a DataFrame of spendings as a parameter. It filters for transportation ('Verkehr'), 
    groups by the specific U-Bahn ('ubahn') using regular expression extraction, and sums up the spendings. 
    The function returns the U-Bahn names ordered from most to least expensive.

    Input:
        - df: DataFrame, the input DataFrame containing spendings and 'Zweck' column.

    Output:
        - ubahn_names: list, U-Bahn names ordered by spending (from most to least expensive).
    """
    # Filter for transportation ('Verkehr') using loc and create a copy
    df_transport = df.loc[df['Politikbereich'] == 'Verkehr'].copy()
    # Extract U-Bahn information using regular expression
    df_transport['ubahn'] = df_transport['Zweck'].str.extract('(U[1-9])')
    # Group by U-Bahn and calculate the sum of spendings
    ubahn_spendings = df_transport.groupby('ubahn')['Betrag'].sum()
    # Order U-Bahn names by spending (from most to least expensive)
    return ubahn_spendings.sort_values(ascending=False).index.tolist()

    
def assignment_04_test():
    ubahn_cost_ranking = ["U5", "U2", "U1", "U6", "U8", "U7", "U9", "U3", "U4"]
    assert all([x == y for x, y in zip(assignment_04(df), ubahn_cost_ranking)])


assignment_04_test()

In [5]:
import csv
import pycountry
import pycountry_convert as pc

def get_continent(country_alpha_2):
    try:
        country_info = pycountry.countries.get(alpha_2=country_alpha_2)
        if country_info:
            continent_code = pc.country_alpha2_to_continent_code(country_alpha_2)
            if continent_code:
                continent_name = pc.convert_continent_code_to_continent_name(continent_code)
                return continent_name
    except KeyError:
        pass
    return 'Unknown'

def generate_all_countries_data():
    # Get a list of all countries
    all_countries = list(pycountry.countries)

    # Generate the data for all countries
    countries_data = [
        {'ID': i + 1, 'Country': country.name, 'Continent': get_continent(country.alpha_2)}
        for i, country in enumerate(all_countries)
    ]

    return countries_data

def save_countries_to_csv(data, filename):
    with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        fieldnames = ['ID', 'Country', 'Continent']
        csv_writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        
        csv_writer.writeheader()
        csv_writer.writerows(data)

# Example usage:
countries_data = generate_all_countries_data()
csv_filename = 'all_countries_data.csv'
save_countries_to_csv(countries_data, csv_filename)

print(f'Data for all countries has been saved to {csv_filename}.')


Data for all countries has been saved to all_countries_data.csv.


In [6]:
df1 = pd.read_csv("countries_tourism_arr.csv", sep=";", skiprows=4)
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Country Name  266 non-null    object
 1   2006          229 non-null    object
dtypes: object(2)
memory usage: 4.3+ KB


In [7]:
df1

Unnamed: 0,Country Name,2006
0,Aruba,1285000
1,Africa Eastern and Southern,"2,26503E+14"
2,Afghanistan,
3,Africa Western and Central,"6,95561E+14"
4,Angola,121000
...,...,...
261,Kosovo,
262,"Yemen, Rep.",
263,South Africa,8509000
264,Zambia,757000


In [8]:
df1.loc[df1['Country Name'] == 'Germany']

Unnamed: 0,Country Name,2006
55,Germany,23569000


In [9]:
import pandas as pd


# List of European countries (modify as needed)
european_countries = [
    "Albania", "Andorra", "Armenia", "Austria", "Azerbaijan",
    "Belarus", "Belgium", "Bosnia and Herzegovina", "Bulgaria", "Croatia",
    "Cyprus", "Czech Republic", "Denmark", "Estonia", "Finland",
    "France", "Georgia", "Germany", "Greece", "Hungary",
    "Iceland", "Ireland", "Italy", "Kazakhstan", "Kosovo",
    "Latvia", "Liechtenstein", "Lithuania", "Luxembourg", "Malta",
    "Moldova", "Monaco", "Montenegro", "Netherlands", "North Macedonia",
    "Norway", "Poland", "Portugal", "Romania", "Russia",
    "San Marino", "Serbia", "Slovakia", "Slovenia", "Spain",
    "Sweden", "Switzerland", "Turkey", "Ukraine", "United Kingdom",
    "Vatican City (Holy See)"
]

# Filter DataFrame for European countries
european_df = df1[df1['Country Name'].isin(european_countries)]
european_df.to_csv('european_countries_t_arr.csv', index=False)
# Display the filtered DataFrame
print(european_df)


               Country Name       2006
5                   Albania     937000
6                   Andorra   10737000
10                  Armenia     382000
14                  Austria   20269000
15               Azerbaijan    1194000
17                  Belgium    6995000
21                 Bulgaria    7499000
24   Bosnia and Herzegovina     256000
25                  Belarus    5276000
37              Switzerland    7863000
53                   Cyprus    2629000
55                  Germany   23569000
58                  Denmark   26936000
70                    Spain   96152000
71                  Estonia        NaN
75                  Finland    3375000
77                   France  193882000
81           United Kingdom   32713000
82                  Georgia     983000
89                   Greece   17284000
99                  Croatia   47733000
101                 Hungary   38318000
111                 Ireland    8001000
114                 Iceland     477000
116                   Ita

In [13]:
import csv

# Specify the file path
csv_file_path = 'google_review_ratings.csv'

# Read existing data from the CSV file
existing_data = []
try:
    with open(csv_file_path, 'r', newline='') as csvfile:
        csv_reader = csv.reader(csvfile)
        for row in csv_reader:
            existing_data.append(row)
except FileNotFoundError:
    print(f"File {csv_file_path} not found.")

# Remove the second row (index 1)
if len(existing_data) > 1:
    del existing_data[1]
    print("Second row deleted.")
else:
    print("Dataset does not have a second row.")

# Write the updated data back to the CSV file
with open(csv_file_path, 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile)
    csv_writer.writerows(existing_data)

print(f"Updated data written to {csv_file_path}")


Second row deleted.
Updated data written to google_review_ratings.csv


In [14]:
df2 = pd.read_csv("google_review_ratings.csv", sep=",")
df2

Unnamed: 0,Unique user id,churches,resorts,beaches,parks,theatres,museums,malls,zoo,restaurants,...,art galleries,dance clubs,swimming pools,gyms,bakeries,beauty & spas,cafes,view points,monuments,gardens
User 1,0.00,0.00,3.63,3.65,5.00,2.92,5.00,2.35,2.33,2.64,...,0.59,0.50,0.00,0.50,0.00,0.00,0.0,0.0,0.00,
User 2,0.00,0.00,3.63,3.65,5.00,2.92,5.00,2.64,2.33,2.65,...,0.59,0.50,0.00,0.50,0.00,0.00,0.0,0.0,0.00,
User 3,0.00,0.00,3.63,3.63,5.00,2.92,5.00,2.64,2.33,2.64,...,0.59,0.50,0.00,0.50,0.00,0.00,0.0,0.0,0.00,
User 4,0.00,0.50,3.63,3.63,5.00,2.92,5.00,2.35,2.33,2.64,...,0.59,0.50,0.00,0.50,0.00,0.00,0.0,0.0,0.00,
User 5,0.00,0.00,3.63,3.63,5.00,2.92,5.00,2.64,2.33,2.64,...,0.59,0.50,0.00,0.50,0.00,0.00,0.0,0.0,0.00,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
User 5452,0.91,5.00,4.00,2.79,2.77,2.57,2.43,1.09,1.77,1.04,...,0.66,0.65,0.66,0.69,5.00,1.05,5.0,5.0,1.56,
User 5453,0.93,5.00,4.02,2.79,2.78,2.57,1.77,1.07,1.76,1.02,...,0.65,0.64,0.65,1.59,1.62,1.06,5.0,5.0,1.09,
User 5454,0.94,5.00,4.03,2.80,2.78,2.57,1.75,1.05,1.75,1.00,...,0.65,0.63,0.64,0.74,5.00,1.07,5.0,5.0,1.11,
User 5455,0.95,4.05,4.05,2.81,2.79,2.44,1.76,1.03,1.74,0.98,...,0.64,0.63,0.64,0.75,5.00,1.08,5.0,5.0,1.12,


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Your data
data = {'COUNTRY': ['Croatia', 'Czechia', 'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 'Portugal', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey', 'United Kingdom'],
        'Activities': ['Photography Tour', 'Cycling Tour', 'Hot Air Balloon Ride', 'Photography Tour', 'Nightlife Experience', 'Festival', 'Culinary Class', 'Sightseeing', 'Wine Tasting', 'Cruise', 'Sightseeing', 'Art Gallery Visit', 'Photography Tour', 'Art Gallery Visit'],
        'DAYS_SPENT': [27, 8, 21, 25, 26, 22, 24,25, 19, 26, 9, 28, 8, 3]}

df = pd.DataFrame(data)

# Pivot the data for the heatmap
heatmap_data = df.pivot(index='COUNTRY', columns='Activities', values='DAYS_SPENT')

# Create a heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(heatmap_data, cmap='viridis', annot=True, fmt='g', linewidths=.5)
plt.title('Days Spent in Different countries  by Activities')
plt.xlabel('Activities')
plt.ylabel('Country')
plt.show()


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Sample DataFrame creation (replace this with your actual DataFrame)
data = {'Country': ['Belgium', 'Belgium', 'Belgium', 'Croatia', 'Croatia', 'Croatia', 'Finland', 'Finland', 'Finland', 'France', 'France', 'France', 'Germany', 'Germany', 'Germany', 'Greece', 'Greece', 'Greece', 'Hungary', 'Hungary', 'Hungary', 'Turkey', 'Turkey', 'Turkey', 'United Kingdom', 'United Kingdom', 'United Kingdom', 'Italy', 'Italy', 'Italy', 'Spain', 'Spain', 'Spain', 'Switzerland', 'Switzerland', 'Switzerland', 'Portugal', 'Portugal', 'Portugal', 'Netherlands', 'Netherlands', 'Netherlands'],
        'BookingSystem': ['Online', 'In-person','Phone','Online','In-person','Phone', 'Online','In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone',  'Online','In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person','Phone','Online',        'In-person',        'Phone',        'Online','In-person', 'Phone','Online','In-person','Phone','Online',        'In-person',      'Phone',     'Online','In-person','Phone',      'Online','In-person','Phone'],
        'Score': [4.43, 4.76, 3.5, 3.45, 3.35, 4.5, 3.68, 4.33, 4.89, 3.36, 3.73, 3.73, 3.24, 4.05, 3.61, 4.01, 3.78, 4.25, 3.29, 3.22, 3.86, 4.82, 4.82, 3.48, 3.16, 4.2, 3.16, 3.07, 4.46, 3.26, 4.79, 3.33, 3.62, 3.51, 3.02, 3.37, 4.37, 4.37, 4.2, 3.5, 4.1, 3.7]
}

if length_country == length_booking_system == length_score:
    df = pd.DataFrame(data)

    # Use pivot_table to handle duplicate entries by taking the mean
    pivot_df = df.pivot_table(index='Country', columns='BookingSystem', values='Score')

    # Round the scores to 2 decimal places
    pivot_df = pivot_df.round(2)

    # Plotting the bar chart
    ax = pivot_df.plot(kind='bar', stacked=True, figsize=(10, 6))

    # Adding labels and title
    plt.xlabel('Country')
    plt.ylabel('Score')
    plt.title('Scores by Country and Booking System')

    # Annotate each bar with the corresponding score
    for i, patch in enumerate(ax.patches):
        width, height = patch.get_width(), patch.get_height()
        x, y = patch.get_xy()
        ax.text(x + width/2, y + height/2, f'{pivot_df.iloc[i // len(pivot_df.columns), i % len(pivot_df.columns)]:.2f}',
                ha='center', va='center', color='black', fontsize=8)

    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
    # Show the plot
    plt.show()
else:
    print("Error: All lists must have the same length.")



In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Sample DataFrame creation (replace this with your actual DataFrame)
countries = ['Belgium', 'Croatia', 'Czechia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 'Netherlands', 'Portugal', 'Spain', 'Sweden', 'Switzerland', 'Turkey', 'United Kingdom']
booking_channels = ['Online', 'In-person', 'Phone']

# Generate random approximate numbers for tourists (you can replace this with actual data)
np.random.seed(42)
tourist_numbers = np.random.randint(1, 10, size=(len(countries), len(booking_channels)))

data = {'Country': np.repeat(countries, len(booking_channels)),
        'Booking_Channel': booking_channels * len(countries),
        'Tourist_Numbers': tourist_numbers.flatten() * 1e6}  # Convert numbers to millions

df = pd.DataFrame(data)

# Check the DataFrame
print(df)

# Use pivot_table to handle duplicate entries by taking the sum
pivot_df = df.pivot_table(index='Country', columns='Booking_Channel', values='Tourist_Numbers', aggfunc='sum')

# Plotting the bar chart
ax = pivot_df.plot(kind='bar', stacked=True, figsize=(12, 7))

# Adding labels and title
plt.xlabel('Country')
plt.ylabel('Tourist Numbers (Millions)')
plt.title('Approximate Number of Booking by via Booking Channel by Country in Milion')

# Annotate each bar with the corresponding tourist numbers
for container in ax.containers:
    ax.bar_label(container, fmt='%.2fM', fontsize=6, color='black', padding=3, labels=container.datavalues)

# Move the legend to the right side
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))

# Show the plot
plt.show()
df.to_csv('tourist_data.csv', index=False)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Sample DataFrame creation (replace this with your actual DataFrame)
countries = ['Belgium', 'Croatia', 'Czechia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 'Netherlands', 'Portugal', 'Spain', 'Sweden', 'Switzerland', 'Turkey', 'United Kingdom']
booking_channels = ['Online', 'In-person', 'Phone']

# Generate random approximate numbers for tourists (you can replace this with actual data)
np.random.seed(42)
tourist_numbers = np.random.randint(1, 10, size=(len(countries), len(booking_channels)))

data = {'Country': np.repeat(countries, len(booking_channels)),
        'Booking_Channel': booking_channels * len(countries),
        'Tourist_Numbers': tourist_numbers.flatten() * 1e6}  # Convert numbers to millions

df = pd.DataFrame(data)

# Use pivot_table to handle duplicate entries by taking the sum
pivot_df = df.pivot_table(index='Country', columns='Booking_Channel', values='Tourist_Numbers', aggfunc='sum')

# Plotting the bar chart
ax = pivot_df.plot(kind='bar', stacked=True, figsize=(10, 6))

# Adding labels and title
plt.xlabel('Country')
plt.ylabel('Tourist Numbers')
plt.title('Approximate Tourist Numbers by Country and Booking Channel')

for container in ax.containers:
    for i, bar in enumerate(container):
        height = bar.get_height()
        label_value = int(round(height / 1e6))  # Convert height to millions and round
        ax.annotate(f'{label_value:,}', (bar.get_x() + bar.get_width() / 2, height),
                    ha='center', va='center', color='black', fontsize=8, xytext=(0, -10),  # Adjusted xytext value
                    textcoords='offset points')

# Move the legend to the right side
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))

# Show the plot
plt.show()

In [None]:
import pandas as pd

# Provided data
data = {
    'Country': ['Belgium', 'Belgium', 'Belgium', 'Croatia', 'Croatia', 'Croatia', 'Czechia', 'Czechia', 'Czechia', 'Finland', 'Finland', 'Finland', 'France', 'France', 'France', 'Germany', 'Germany', 'Germany', 'Greece', 'Greece', 'Greece', 'Hungary', 'Hungary', 'Hungary', 'Ireland', 'Ireland', 'Ireland', 'Italy', 'Italy', 'Italy', 'Netherlands', 'Netherlands', 'Netherlands', 'Portugal', 'Portugal', 'Portugal', 'Spain', 'Spain', 'Spain', 'Sweden', 'Sweden', 'Sweden', 'Switzerland', 'Switzerland', 'Switzerland', 'Turkey', 'Turkey', 'Turkey', 'United Kingdom', 'United Kingdom', 'United Kingdom'],
    'Booking_Channel': ['Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone', 'Online', 'In-person', 'Phone'],
    'Tourist_Numbers': [7000000.00, 4000000.00, 8000000.00, 5000000.00, 7000000.00, 3000000.00, 7000000.00, 8000000.00, 5000000.00, 4000000.00, 8000000.00, 8000000.00, 3000000.00, 6000000.00, 5000000.00, 2000000.00, 8000000.00, 6000000.00, 2000000.00, 5000000.00, 1000000.00, 6000000.00, 9000000.00, 1000000.00, 3000000.00, 7000000.00, 4000000.00, 9000000.00, 3000000.00, 5000000.00, 3000000.00, 7000000.00, 5000000.00, 5000000.00, 9000000.00, 7000000.00, 2000000.00, 4000000.00, 9000000.00, 2000000.00, 9000000.00, 5000000.00, 2000000.00, 4000000.00, 7000000.00, 8000000.00, 3000000.00, 1000000.00, 4000000.00, 2000000.00, 8000000.00, 4000000.00],
}

# Create DataFrame
df = pd.DataFrame(data)

# Save to CSV
df.to_csv('tourist_data.csv', index=False)


In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Dataset A
data_A = np.array([1, 2, 3, 4, 5])

# Dataset B
data_B = np.array([2, 3, 3, 4, 4])

# Plot histograms
plt.hist(data_A, alpha=0.5, label='Dataset A')
plt.hist(data_B, alpha=0.5, label='Dataset B')

plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histograms of Dataset A and Dataset B')
plt.legend()

plt.show()


In [None]:
import matplotlib.pylab as plt
import numpy as np
x = np.linspace(0, 10, 100)
plt.plot(x, x + 3, 'r:')

In [None]:
import matplotlib.pyplot as plt

# Coordinates of point a
a = (2, 3)

# Coordinates of point b
b = (5, 7)

# Create a scatter plot
plt.scatter(*a, color='blue', label='Point a')
plt.scatter(*b, color='red', label='Point b')

# Add labels and legend
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot of Points a and b')
plt.legend()

# Display the plot
plt.show()

In [None]:
import matplotlib.pylab as plt
a = [2,3]
b = [-2, 9]
plt.plot(a,b)

In [None]:
import matplotlib.pylab as plt
a = [2,3]
b = [-2, 9]
plt.plot([a[0],b[0]],[a[1],b[1]])

In [None]:
import matplotlib.pylab as plt
import numpy as np
a = np.random.randn(100, 2)
plt.plot(a[:,0],a[:,1] ,'.')

In [None]:
def assignment_01():
    # Define the coordinates of the two lines
    line1_start = [-1, 4]
    line1_end = [3, 5]
    line2_start = [3, 6]
    line2_end = [1, 8]

    # Create the plot
    plt.figure()

    # Draw the first line
    plt.plot(line1_start[0], line1_start[1], 'r-x', label='Line 1')  # Line style 'r-x'
    plt.plot([line1_start[0], line1_end[0]], [line1_start[1], line1_end[1]], 'r-', alpha=0.5)  # Connect the starting and ending points

    # Draw the second line
    plt.plot(line2_start[0], line2_start[1], 'k-o', label='Line 2')  # Line style 'k-o'
    plt.plot([line2_start[0], line2_end[0]], [line2_start[1], line2_end[1]], 'k-', alpha=0.5)  # Connect the starting and ending points

    # Add labels and title
    plt.xlabel('X-axis')
    plt.ylabel('Y-axis')
    plt.title('Assignment 01')

    # Add legend
    plt.legend()

    # Show the plot
    plt.show()

# Call the function to draw the lines
assignment_01()