# Data Concatenation
Not to be confused with merge, concatenating dataframes means to reuse the same columns and extend the size of the dataframe, dependent on your axis. It's useful if your data is split into partitions, but need to combine them into a monolithic dataframe.

In [1]:
# # Concatenate uber1, uber2, and uber3: row_concat
# row_concat = pd.concat([uber1, uber2, uber3])

# # Print the shape of row_concat
# print(row_concat.shape)

# # Print the head of row_concat
# print(row_concat.head())

By default, axis=0 meaning that the dataframes are concatenated row-wise. Axis=1 means that the dataframes are concatenated column-wise.

In [2]:
# # Concatenate ebola_melt and status_country column-wise: ebola_tidy
# ebola_tidy = pd.concat([ebola_melt, status_country], axis=1)

# # Print the shape of ebola_tidy
# print(ebola_tidy.shape)

# # Print the head of ebola_tidy
# print(ebola_tidy.head())

### Finding files that match a pattern
You're now going to practice using the glob module to find all csv files in the workspace. In the next exercise, you'll programmatically load them into DataFrames.

As Dan showed you in the video, the glob module has a function called glob that takes a pattern and returns a list of the files in the working directory that match that pattern.

For example, if you know the pattern is part_ single digit number .csv, you can write the pattern as 'part_?.csv' (which would match part_1.csv, part_2.csv, part_3.csv, etc.)

Similarly, you can find all .csv files with '*.csv', or all parts with 'part_*'. The ? wildcard represents any 1 character, and the * wildcard represents any number of characters.

In [3]:
# # Import necessary modules
# import glob
# import pandas as pd

# # Write the pattern: pattern
# pattern = '*.csv'

# # Save all file matches: csv_files
# csv_files = glob.glob(pattern)

# # Print the file names
# print(csv_files)

# # Load the second file into a DataFrame: csv2
# csv2 = pd.read_csv(csv_files[1])

# # Print the head of csv2
# print(csv2.head())

### Iterating and concatenating all matches
Now that you have a list of filenames to load, you can load all the files into a list of DataFrames that can then be concatenated.

You'll start with an empty list called frames. Your job is to use a for loop to:

iterate through each of the filenames
read each filename into a DataFrame, and then
append it to the frames list.
You can then concatenate this list of DataFrames using pd.concat(). Go for it!

In [4]:
# # Create an empty list: frames
# frames = []

# #  Iterate over csv_files
# for csv in csv_files:

#     #  Read csv into a DataFrame: df
#     df = pd.read_csv(csv)
    
#     # Append df to frames
#     frames.append(df)

# # Concatenate frames into a single DataFrame: uber
# uber = pd.concat(frames)

# # Print the shape of uber
# print(uber.shape)

# # Print the head of uber
# print(uber.head())

# Data Merge

Merging data allows you to combine disparate datasets into a single dataset to do more complex analysis.

Two DataFrames have been pre-loaded: site and visited. Explore them in the IPython Shell and take note of their structure and column names. Your task is to perform a 1-to-1 merge of these two DataFrames using the 'name' column of site and the 'site' column of visited.

In [5]:
# # Merge the DataFrames: o2o
# o2o = pd.merge(left=site, right=visited, left_on='name', right_on='site')

# # Print o2o
# print(o2o.head())

In [6]:
# # Merge the DataFrames: m2o
# m2o = pd.merge(left=site, right=visited, left_on='name', right_on='site')

# # Print m2o
# print(m2o)

### Many-to-many data merge
The final merging scenario occurs when both DataFrames do not have unique keys for a merge. What happens here is that for each duplicated key, every pairwise combination will be created.

Two example DataFrames that share common key values have been pre-loaded: df1 and df2. Another DataFrame df3, which is the result of df1 merged with df2, has been pre-loaded. All three DataFrames have been printed - look at the output and notice how pairwise combinations have been created. This example is to help you develop your intuition for many-to-many merges.

Here, you'll work with the site and visited DataFrames from before, and a new survey DataFrame. Your task is to merge site and visited as you did in the earlier exercises. You will then merge this merged DataFrame with survey.

Begin by exploring the site, visited, and survey DataFrames in the IPython Shell.

In [7]:
# # Merge site and visited: m2m
# m2m = pd.merge(left=site, right=visited, left_on='name', right_on='site')

# # Merge m2m and survey: m2m
# m2m = pd.merge(left=m2m, right=survey, left_on='ident', right_on='taken')

# # Print the first 20 lines of m2m
# print(m2m.head(20))