- There are cases in which it is necessary to read multiple files with the same format and structure  
  and make them into a single data frame for analysis.  
- In the example below,  
  6 monthly revenue files for the first half of 2023 are to be created as one data frame.  
- There is also a way to merge files after reading them one by one,  
  but the code below is useful if there are a large number of files to be read and merged.

In [1]:
import pandas as pd
import glob

1. Specify the folder path where the files are located and check the number and list of target files.

In [2]:
path = './data/revenue/**'

file_list = []
for file in glob.glob(path, recursive=True):
    file_list.append(file)

file_list_excel = [file for file in file_list if file.endswith('.xlsx')]

print(len(file_list_excel))
print(file_list_excel)

6
['./data/revenue\\sample_data_revenue_2301.xlsx', './data/revenue\\sample_data_revenue_2302.xlsx', './data/revenue\\sample_data_revenue_2303.xlsx', './data/revenue\\sample_data_revenue_2304.xlsx', './data/revenue\\sample_data_revenue_2305.xlsx', './data/revenue\\sample_data_revenue_2306.xlsx']


2. Merge the files into a single data frame using the loaded file path and list

In [3]:
df = pd.DataFrame()
for i in file_list_excel:
    data = pd.read_excel(i)
    df = pd.concat([df, data])
    
df = df.reset_index(drop = True)

3. Check the completed data frame.
- If you check the month information in the 'date' column,  
  you can see that the data from January to June are well combined.

In [4]:
print(df.shape)
df.head()

(181, 2)


Unnamed: 0,date,revenue
0,2023-01-01,1240
1,2023-01-02,2147
2,2023-01-03,2085
3,2023-01-04,2891
4,2023-01-05,2901


In [5]:
df.tail()

Unnamed: 0,date,revenue
176,2023-06-26,2289
177,2023-06-27,1560
178,2023-06-28,1067
179,2023-06-29,1992
180,2023-06-30,2263


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181 entries, 0 to 180
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   date     181 non-null    datetime64[ns]
 1   revenue  181 non-null    int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 3.0 KB


In [8]:
df['date'].dt.month.unique()

array([1, 2, 3, 4, 5, 6])