## Experiment 4

### Data Concatenation: Concatenate multiple datasets along rows or columns to create a unified dataset. 

### 1. Concatenating along Rows
If we want to concatenate datasets along the rows (stacking them vertically), use the `pd.concat()` function:

In [7]:
import pandas as pd

# Creating sample datasets
data1 = {'A': [1, 2, 3], 'B': [4, 5, 6]}
data2 = {'A': [7, 8, 9], 'B': [10, 11, 12]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Concatenate along rows (axis=0)
concatenated_df = pd.concat([df1, df2], axis=0)

print("Concatenating along Rows:\n")
print(concatenated_df)

Concatenating along Rows:

   A   B
0  1   4
1  2   5
2  3   6
0  7  10
1  8  11
2  9  12


### Wine Dataset

In [11]:
# Creating two sample Wine datasets
wine_data1 = {'Alcohol': [13.2, 13.5, 13.8], 
              'Malic_Acid': [2.34, 1.23, 2.67], 
              'Ash': [2.5, 2.4, 2.6]}

wine_data2 = {'Alcohol': [12.9, 13.1, 12.8], 
              'Malic_Acid': [1.98, 2.01, 1.95], 
              'Ash': [2.3, 2.4, 2.5]}

df_wine1 = pd.DataFrame(wine_data1)
df_wine2 = pd.DataFrame(wine_data2)

print("First Wine Dataset:\n\n", df_wine1)
print("\nSecond Wine Dataset:\n\n", df_wine2)

First Wine Dataset:

    Alcohol  Malic_Acid  Ash
0     13.2        2.34  2.5
1     13.5        1.23  2.4
2     13.8        2.67  2.6

Second Wine Dataset:

    Alcohol  Malic_Acid  Ash
0     12.9        1.98  2.3
1     13.1        2.01  2.4
2     12.8        1.95  2.5


In [13]:
# Concatenating along rows
wine_concat_rows = pd.concat([df_wine1, df_wine2], axis=0)

print("Concatenated Along Rows:\n\n", wine_concat_rows)

Concatenated Along Rows:

    Alcohol  Malic_Acid  Ash
0     13.2        2.34  2.5
1     13.5        1.23  2.4
2     13.8        2.67  2.6
0     12.9        1.98  2.3
1     13.1        2.01  2.4
2     12.8        1.95  2.5


### 2. Concatenating along Columns

If we want to concatenate datasets along the columns (side-by-side horizontally), set `axis=1`.

In the second case, we might see duplicate column names, so it's useful to rename columns if needed.

In [6]:
# Concatenate along columns (axis=1)
concatenated_df = pd.concat([df1, df2], axis=1)

print("Concatenating along Columns:\n")
print(concatenated_df)

Concatenating along Columns:

   A  B  A   B
0  1  4  7  10
1  2  5  8  11
2  3  6  9  12


### Wine Dataset

In [14]:
# Concatenating along columns
wine_concat_columns = pd.concat([df_wine1, df_wine2], axis=1)

print("Concatenated Along Columns:\n\n", wine_concat_columns)

Concatenated Along Columns:

    Alcohol  Malic_Acid  Ash  Alcohol  Malic_Acid  Ash
0     13.2        2.34  2.5     12.9        1.98  2.3
1     13.5        1.23  2.4     13.1        2.01  2.4
2     13.8        2.67  2.6     12.8        1.95  2.5


### 3. Using Append

The `append()` function in `pandas` is another way to combine datasets by adding rows. However, note that it was deprecated in `pandas` version 2.0. The `pd.concat()` method is now the recommended approach for appending. Nevertheless, here's how the `append()` function works for earlier versions:

### Example with `append()` (Deprecated in newer versions)

In [8]:
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Appending df2 to df1
appended_df = df1.append(df2)

print("Using append function:\n")
print(appended_df)

Using append function:

   A   B
0  1   4
1  2   5
2  3   6
0  7  10
1  8  11
2  9  12


  appended_df = df1.append(df2)


### Migrating from `append()` to `pd.concat()`
Since `append()` is deprecated, we can replace it with `pd.concat()` like this.


Using `ignore_index=True` in `concat()` ensures that the indices are reset after appending, similar to the behavior of `append()`.

In [9]:
# Using concat() as a replacement for append()
concatenated_df = pd.concat([df1, df2], ignore_index=True)

print("Using concat instead of append function:\n")
print(concatenated_df)

Using concat instead of append function:

   A   B
0  1   4
1  2   5
2  3   6
3  7  10
4  8  11
5  9  12


### Wine Dataset

In [16]:
# Appending the second dataset to the first (Deprecated)
wine_appended = df_wine1.append(df_wine2)

print("Appended Dataset:\n\n", wine_appended)

Appended Dataset:

    Alcohol  Malic_Acid  Ash
0     13.2        2.34  2.5
1     13.5        1.23  2.4
2     13.8        2.67  2.6
0     12.9        1.98  2.3
1     13.1        2.01  2.4
2     12.8        1.95  2.5


  wine_appended = df_wine1.append(df_wine2)


In [18]:
# Using concat() as a replacement for append()
wine_concat_as_append = pd.concat([df_wine1, df_wine2], ignore_index=True)

print("Concatenated (as append) Dataset:\n\n", wine_concat_as_append)

Concatenated (as append) Dataset:

    Alcohol  Malic_Acid  Ash
0     13.2        2.34  2.5
1     13.5        1.23  2.4
2     13.8        2.67  2.6
3     12.9        1.98  2.3
4     13.1        2.01  2.4
5     12.8        1.95  2.5
