<h1 dir=rtl align=center style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
شاخه و کلاف
</font>
</h1>

<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
You can download the datasets for this exercise from 
<a href="https://docs.google.com/spreadsheets/d/1WHi8Btw9YCuF0vbkBgphPoWb1rmxyZkA/edit?usp=drive_link&ouid=102809079943831791310&rtpof=true&sd=true">here (Steel)</a> 
and 
<a href="https://docs.google.com/spreadsheets/d/1MdzmRdPKIAZMPu3XlnognXFPf8t6_XPa/edit?usp=drive_link&ouid=102809079943831791310&rtpof=true&sd=true">here (Dimdate)</a>.
</font>
</p>


<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
Hot rolled steel refers to steel that has been rolled at very high temperatures (above 1700°F), which exceed the recrystallization temperature for most types of steel. This process makes it easier to shape the steel, resulting in products with a wider variety of appearances.
</font>
</p>

<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
A hot rolling steel company, due to market conditions, diverse sales, and high sales volume, requires dynamic and diverse reports.  
Generally, the production and sales process in this industry works as follows: steel billets enter the system, undergo rolling, and the rolled sections are delivered to customers. A customer places an order for a specific number of bundles (a bundle refers to a packaged group of rebar in the factory); however, sales occur by tonnage, and the cost is calculated per kilogram.  
Each bundle corresponds to a billet weighing 2 tons, but it is not fully utilized during the production process. The weight of the final product is less than the ordered weight.  
Therefore, the difference between the bundle weight and the sales tonnage indicates the amount of waste.
</font>
</p>

<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
Initially, after importing the required libraries, it is necessary to have an overview of the dataframe.  
Since the Excel file includes two separate sheets, both sheets need to be read and stored in two separate variables.  
The names of these sheets are **Transaction** and **Customer**.  
Finally, read and store the **Dimdate** file as well.
</font>
</p>


In [1]:
import pandas as pd
import numpy as np
from openpyxl import load_workbook

<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
The **Transaction** table has 13 columns as described below:
</font>
</p>

<center>
<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">

| Column Name       | Description                                                                                 |
|:------------------:|:-------------------------------------------------------------------------------------------:|
| Transaction_ID     | Transaction identifier                                                                     |
| Product_ID         | Product identifier                                                                         |
| Export             | Indicates whether the product is for export or domestic consumption                        |
| Industry           | Specifies the type of customer industry                                                   |
| City_Category      | Location category relative to the province of the customer's head office                  |
| Stay_In_Customer   | Number of days the customer's order stayed at the factory                                  |
| Material_Status    | Indicates whether the customer's product was supplied from inventory or is in production  |
| Payment            | Order payment amount in rials                                                              |
| Unit_Price         | Price per kilogram of the product in rials                                                 |
| Save_Date_Time     | Order registration time in the Persian calendar                                            |
| Send_Date          | Shipping date in the Gregorian calendar                                                   |
| Customer_ID        | Customer code                                                                             |
| Bandel             | Number of bundles (each bundle equals **2 ± 0.05** tons; 2 tons are used in calculations) |
| Real_Weight        | Actual weight sold                                                                        |

</font>
</p>
</center>


In [2]:
Transaction = pd.read_excel("/kaggle/input/steel-data/Steel.xlsx", sheet_name="Transaction")
Transaction.head()

Unnamed: 0,Transaction_ID,Product_ID,Export,Industry,City_Category,Stay_In_Customer,Material_Status,Payment,Unit_Price,Save_Date_Time,Send_Date,Customer_ID,Bandel,real_weight
0,1003590,R.S.102001,0,جوش,A,1,0,2838150000,109437.418061,1400/12/23 08:47,2022-03-16,CU00047,13,25934
1,1005621,R.S.102002,1,تراش,A,1,0,16065000000,619313.801079,1400/12/21 10:10,2022-03-12,CU00079,13,25940
2,1001950,R.S.102003,1,پرچ,C,1,0,17228900000,664080.326858,1400/12/16 15:02,2022-03-08,CU00031,13,25944
3,1001724,R.S.102004,1,ساختمان,C,3,0,13972400000,637805.267723,1400/12/16 10:03,2022-03-08,CU00083,11,21907
4,1004480,R.S.102005,1,ساختمان,B,1,0,11947100000,460726.543519,1400/12/14 09:34,2022-03-09,CU00001,13,25931


<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
The **Customers** sheet contains six columns as described below:
</font>
</p>

<ul>
<center>
<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">

| Column Name        | Description                                                                 |
|:------------------:|:---------------------------------------------------------------------------:|
| Customer_ID        | Transaction identifier                                                     |
| Province           | Name of the province                                                      |
| City_Category      | Location category relative to the province of the customer's head office   |
| Stay_In_Customer   | Number of days the customer's order stayed at the factory                  |
| Export/Import      | Indicates whether the product is for export or domestic consumption        |
| Industry_Cod       | Industry code                                                             |

</ul>

<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
In this table, the **Industry_Cod** provides a more detailed classification of industries:  
- Codes 0 to 4 represent the **construction** industry.  
- Codes 5 to 9 represent the **machining** industry.  
- Codes 10 to 13 represent the **riveting** industry.  
- Codes 14 to 17 represent the **drawing** industry.  
- Codes 18 to 20 represent the **welding** industry.  
</font>
</p>


In [3]:
Customer = pd.read_excel("/kaggle/input/steel-data/Steel.xlsx", sheet_name="Customer")
Customer.head()

Unnamed: 0,Customer_ID,Province,City_Category,Stay_In_Customer,Export/Import,Industry_Cod
0,CU00047,تهران,A,1,0,15
1,CU00079,تهران,A,1,1,7
2,CU00163,تهران,A,0,1,10
3,CU00051,تهران,A,1,0,20
4,CU00057,تهران,A,0,1,4


<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
The **Dimdate** table is a helper table that can be used to convert Gregorian dates to Persian (Solar Hijri) dates and vice versa. This table includes comprehensive calendar information such as:  
- Day  
- Month  
- Year  
- Month name  
- Day of the week  
- Week number of the year, etc., covering the years from 1320 to 1429 in the Persian calendar.  
</font>
</p>


In [4]:
dates = pd.read_excel("/kaggle/input/dimdate/Dimdate.xlsx")
dates.head()

Unnamed: 0,Miladi,Jalali_1,Jalali_2,Jalali_3,Jalali_4,Miladi.1,jyear,mmonthN,jmonthN,mmonthT,...,mnime,jnime,JquarterN,JQuarterT,MquarterN,JWeekDay,MWeekDay,MWeekNum,JWeekNum,تعطیل/غیرتعطیل
0,1941-03-21,1320/01/01,1320/1/1,20/1/1,1320.01.1,1941-03-21,1320,3,1,March,...,First Half of Year,نیمه اول سال,1,بهار,1,جمعه,Friday,12,1,
1,1941-03-22,1320/01/02,1320/1/2,20/1/2,1320.01.2,1941-03-22,1320,3,1,March,...,First Half of Year,نیمه اول سال,1,بهار,1,شنبه,Saturday,12,2,
2,1941-03-23,1320/01/03,1320/1/3,20/1/3,1320.01.3,1941-03-23,1320,3,1,March,...,First Half of Year,نیمه اول سال,1,بهار,1,یکشنبه,Sunday,13,2,
3,1941-03-24,1320/01/04,1320/1/4,20/1/4,1320.01.4,1941-03-24,1320,3,1,March,...,First Half of Year,نیمه اول سال,1,بهار,1,دوشنبه,Monday,13,2,
4,1941-03-25,1320/01/05,1320/1/5,20/1/5,1320.01.5,1941-03-25,1320,3,1,March,...,First Half of Year,نیمه اول سال,1,بهار,1,سه شنبه,Tuesday,13,2,


<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
Before merging the **Transaction** and **Customer** dataframes, decisions need to be made regarding each column.  
In the final dataframe, we only want to retain the following columns:  

- **Transaction_ID**  
- **Export**  
- **Industry**  
- **City_Category**  
- **Payment**  
- **Customer_ID**  
- **Bandel**  
- **Real_Weight**  
- **Province**  
- **Industry_Cod**  
- **Waste**  
- **Year**  
- **Month**  
- **Type**  
- **Class**  

<p style="line-height: 200%;">
Add a new column for waste to the dataframe as described earlier. Name the column exactly **Waste**.  
</p>
<p style="line-height: 200%;">
Also, convert the **Bandel** column to kilograms and save it in the same column.  
</p>
<p style="line-height: 200%;">
The **Year** and **Month** columns should be in the Gregorian format. To calculate these values:  
- Add a new column named **Miladi** to the dataframe by converting the Persian dates in the **Save_Date_Time** column to their Gregorian equivalents.  
- Extract the year and month from the **Miladi** column and store them in the **Year** and **Month** columns, respectively.  
- Finally, delete the **Miladi** column after extraction.  
</p>
<p style="line-height: 200%;">
Split the **Product_ID** column into three separate columns using the dot (`.`) as a delimiter:  
- Name the first column **Type**.  
- Name the second column **Class**.  
- Discard the third column.  
</p>
<p style="line-height: 200%;">
Scale the **Payment** column, which is currently in rials, to billions of tomans.  
</p>
</font>
</p>


In [5]:
df = Transaction[['Transaction_ID', 'Product_ID', 'Export', 'Industry', 'City_Category', 
                       'Payment', 'Customer_ID', 'Save_Date_Time', 'Bandel', 'real_weight']].copy()

# Pivot
df = df.merge(Customer[['Customer_ID', 'Province', 'Industry_Cod']], 
              on='Customer_ID', 
              how='left')

# Unit Transforms
df['Payment'] = df['Payment'] * 10 ** -10
df['Bandel'] = df['Bandel'] * 2 * 10**3
df['Waste'] = df['Bandel'] - df['real_weight']

# ّFor Year and Month
df['Jalali_1'] = df['Save_Date_Time'].str.split(' ').str[0]
df = df.merge(dates[['Jalali_1', 'Miladi']], on='Jalali_1', how='left')
df['Year'] = pd.to_datetime(df['Miladi']).dt.year
df['Month'] = pd.to_datetime(df['Miladi']).dt.month
df.drop(columns=['Miladi'], inplace=True)
df.drop(columns=['Jalali_1'], inplace=True)

#Defining Class and Type
df['Type'] = df['Product_ID'].str.split('.').str[0]  
df['Class'] = df['Product_ID'].str.split('.').str[1]  
del df['Product_ID']

df.head()

Unnamed: 0,Transaction_ID,Export,Industry,City_Category,Payment,Customer_ID,Save_Date_Time,Bandel,real_weight,Province,Industry_Cod,Waste,Year,Month,Type,Class
0,1003590,0,جوش,A,0.283815,CU00047,1400/12/23 08:47,26000,25934,تهران,15,66,2022,3,R,S
1,1005621,1,تراش,A,1.6065,CU00079,1400/12/21 10:10,26000,25940,تهران,7,60,2022,3,R,S
2,1001950,1,پرچ,C,1.72289,CU00031,1400/12/16 15:02,26000,25944,آذربایجان شرقی,12,56,2022,3,R,S
3,1001724,1,ساختمان,C,1.39724,CU00083,1400/12/16 10:03,22000,21907,آذربایجان شرقی,4,93,2022,3,R,S
4,1004480,1,ساختمان,B,1.19471,CU00001,1400/12/14 09:34,26000,25931,اصفهان,0,69,2022,3,R,S


<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
Now that we have clean and standardized data, we would like you to answer the following questions.
</font>
</p>


<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
We want to determine the sales amount for imports and exports for each year and each month of that year.  
Ensure that the **Year** and **Month** columns are set as the index of the dataframe.
</font>
</p>


In [6]:
Sales = (
    df.groupby(['Year', 'Month', 'Export'])['Payment']
    .sum()
    .unstack('Export')  
    .rename(columns={0: 'Import_Sales', 1: 'Export_Sales'})  
    
)

# Idnetifing the indexes
Sales.index.names = ['Year', 'Month']

#Print
print(Sales.head())


Export      Import_Sales  Export_Sales
Year Month                            
2019 3         112.62353     388.40085
     4        3877.70945   12906.48373
     5        4336.27345   14330.60550
     6        3414.72087   11443.02170
     7        5029.53574   15807.34280


<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
We need the top ten customers with the highest purchase volume, including the number of purchases and the total production waste for each.  
This table should be sorted in descending order based on the total sales amount.
</font>
</p>


In [7]:
Customers = (
    df.groupby('Customer_ID')
    .agg(
        Total_Payment=('Payment', 'sum'),  
        Purchase_Count=('Transaction_ID', 'count'), 
        Total_Waste=('Waste', 'sum')            
    )
    .sort_values(by='Total_Payment', ascending=False)  
    .head(10)  
)

print(Customers)


             Total_Payment  Purchase_Count  Total_Waste
Customer_ID                                            
CU00023        3984.895920            3149       267497
CU00025        3973.675250            3220       274573
CU00033        3965.757650            3306       281501
CU00121        3961.633390            3226       274715
CU00042        3961.324220            3153       268965
CU00115        3955.172240            3173       269558
CU00006        3946.778040            3169       272183
CU00056        3932.243960            3248       276504
CU00119        3924.168310            3198       275012
CU00086        3909.184615            3154       268767


<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
We want to determine the sales amount, the production waste, and the number of sales for each industry (machining, welding, construction, riveting, and drawing) in each **Type** and **Class** subcategory.
</font>
</p>

In [8]:
# Grouping by the question
Industries = (
    df.groupby(['Industry', 'Type', 'Class'])
    .agg(
        Total_Sales=('Payment', 'sum'),       
        Total_Waste=('Waste', 'sum'),        
        Sale_Count=('Transaction_ID', 'count') 
    )
    .reset_index()  
)

# Displaying the final table
print(Industries)

   Industry Type Class    Total_Sales  Total_Waste  Sale_Count
0      تراش    R    NS   18375.667530      1300133       15261
1      تراش    R     S   28714.450050      2008561       23636
2      تراش    w     S   73206.347230      5145415       60605
3       جوش    R    NS   20466.210550      1389764       16333
4       جوش    R     S   31045.327075      2096386       24677
5       جوش    w     S   80128.746687      5441131       63878
6   ساختمان    R    NS   42826.709120      3073275       36116
7   ساختمان    R     S   64780.002982      4655752       54796
8   ساختمان    w     S  169059.542450     12132941      142698
9       پرچ    R    NS   12191.450060       831532        9757
10      پرچ    R     S   18525.906770      1282743       15071
11      پرچ    w     S   47127.900980      3285344       38595
12      کشش    R    NS    8799.718550       643128        7576
13      کشش    R     S   13236.022060       977759       11508
14      کشش    w     S   34002.783365      2516497     

<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
We want to know the total purchase amount for each province in each year. Additionally, we need the number of purchases and the total production waste for each province.
</font>
</p>


In [9]:
Provinces = (
    df.groupby(['Year', 'Province'])
    .agg(
        Total_Sales=('Payment', 'sum'),  
        Total_Waste=('Waste', 'sum'),        
        Purchase_Count=('Transaction_ID', 'count')  
    )
    
)

# Print
print(Provinces)


                      Total_Sales  Total_Waste  Purchase_Count
Year Province                                                 
2019 آذربایجان شرقی   6872.733520       482517            5655
     اردبیل           6915.076650       488813            5760
     اصفهان          11453.073220       808842            9503
     بندرعباس        10731.201260       761924            8959
     تهران           39955.313490      2812958           33067
     خراسان جنوبی    12732.774440       876003           10292
     خوزستان          7860.298890       554990            6511
     سمنان            9833.136930       684367            8066
     سیستان          13739.441480       968521           11406
     مازندران        13601.189270       950833           11226
     مرکزی            7696.463370       547884            6429
     همدان            4734.970100       337428            3972
     کرمانشاه         8737.955740       621520            7271
     گلستان           8861.614000       624562         

<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
In the industry, rebar can be produced either as straight bars or as coils. Straight rebar is produced in straight lengths, while coil rebar is wound into rolls. The **Type** column indicates the production form:  
- **R** represents straight bars.  
- **W** represents coils.  

We want to determine the total kilograms of products produced for each **City_Category**, categorized by the **Type** (straight or coil) and their respective **Class**.
</font>
</p>


In [10]:
# Grouping by City_Category, Type, and Class, and calculating total weight produced
Rw = (
    df.groupby(['City_Category', 'Type', 'Class'])
    .agg(
        Total_Weight=('real_weight', 'sum')  # Total weight produced
    )
    .reset_index()  # Resetting index for better display
)

# Filtering rows where Type is either R (Branch) or W (Coil)
Rw = Rw[Rw['Type'].isin(['R', 'W'])]

# Displaying the final table
print(Rw)


  City_Category Type Class  Total_Weight
0             A    R    NS     551916537
1             A    R     S     845970773
3             B    R    NS     869707675
4             B    R     S    1317519235
6             C    R    NS     638447956
7             C    R     S     978310791


<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
As you know, quartiles are statistical terms that divide a dataset into four parts using three points. These quartiles are referred to as the first quartile, second quartile, and third quartile, commonly denoted as Q1, Q2, and Q3, respectively.  
To determine quartiles, the data must be sorted in ascending order.  
- **Q1** (the first quartile or lower quartile) is the value below which 25% of the observations fall.  
- **Q2** (the second quartile) is the median, which splits the dataset into two halves, where 50% of the observations are smaller and 50% are larger.  
- **Q3** (the third quartile or upper quartile) is the value above which 25% of the observations fall and below which 75% of the observations fall.  

<p style="line-height: 200%;">
Now, we need to categorize the values in the **Price** column. This data should be divided into three categories:  
- Prices strictly below **Q1** should be labeled as **Low**.  
- Prices between **Q1** and **Q3** (inclusive) should be labeled as **Med**.  
- Prices above **Q3** should be labeled as **High**.  

Store these categorizations in a new column called **PriceLevel**.
</p>

<p style="line-height: 200%;">
To achieve this, define a function named **Categorize**, and use the `apply` method to apply it to the dataframe.
</p>
</font>
</p>


<details dir="rtl">
<summary dir="rtl">راهنمایی</summary>
چارک اول و سوم با دستور quantile(0.25) و quantile(0.75) قابل دسترسی خواهند بود.
</details>

In [11]:
Q1 = df['Payment'].quantile(0.25)
Q3 = df['Payment'].quantile(0.75)

def Categorize(x):
    if x < Q1:
        return 'Low'
    elif Q1 <= x <= Q3:
        return 'Med'
    else:
        return 'High'


In [12]:
df['PriceLevel'] = df['Payment'].apply(Categorize)

print(df)

        Transaction_ID  Export Industry City_Category   Payment Customer_ID  \
0              1003590       0      جوش             A  0.283815     CU00047   
1              1005621       1     تراش             A  1.606500     CU00079   
2              1001950       1      پرچ             C  1.722890     CU00031   
3              1001724       1  ساختمان             C  1.397240     CU00083   
4              1004480       1  ساختمان             B  1.194710     CU00001   
...                ...     ...      ...           ...       ...         ...   
550063         1002006       1  ساختمان             A  0.706640     CU00012   
550064         1002361       1     تراش             B  1.028470     CU00160   
550065         1003039       0  ساختمان             C  1.238940     CU00095   
550066         1004401       1  ساختمان             A  1.409200     CU00033   
550067         1005539       0      کشش             B  0.215280     CU00064   

          Save_Date_Time  Bandel  real_weight      

<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
Next, use a filter command to extract the part of the dataframe related to industrial codes with more than **50,000 sales**.  
Store this filtered data in a variable named **Indfilter**.
</font>
</p>


In [13]:
Indfilter = df.groupby('Industry_Cod')['Transaction_ID'].count().reset_index()
Indfilter.columns = ['Industry_Cod', 'Sale_Count']

Indfilter = Indfilter[Indfilter['Sale_Count'] > 50000]['Industry_Cod']

Indfilter = df[df['Industry_Cod'].isin(Indfilter)]

print(Indfilter.head())


    Transaction_ID  Export Industry City_Category  Payment Customer_ID  \
2          1001950       1      پرچ             C  1.72289     CU00031   
3          1001724       1  ساختمان             C  1.39724     CU00083   
4          1004480       1  ساختمان             B  1.19471     CU00001   
8          1002950       1  ساختمان             C  1.90656     CU00155   
10         1000049       1      پرچ             C  2.27001     CU00014   

      Save_Date_Time  Bandel  real_weight        Province  Industry_Cod  \
2   1400/12/16 15:02   26000        25944  آذربایجان شرقی            12   
3   1400/12/16 10:03   22000        21907  آذربایجان شرقی             4   
4   1400/12/14 09:34   26000        25931          اصفهان             0   
8   1400/12/09 11:57   26000        25929          اردبیل             0   
10  1400/11/30 09:37   26000        25947          اردبیل            12   

    Waste  Year  Month Type Class PriceLevel  
2      56  2022      3    R     S       High  
3      93 

<p dir="ltr" style="text-align: justify; line-height: 200%; font-family: Arial; font-size: medium">
<font face="Arial" size="3">
As the final task, calculate and report the **average waste** for the **construction industry** in the year **2020** for purchases with **High** prices, normalized by the number of bundles.
</font>
</p>


In [14]:
w2020 = df[
    (df['Industry'] == 'ساختمان') & 
    (df['Year'] == 2020) & 
    (df['PriceLevel'] == 'High') & 
    (df['Bandel'] > 0)
]

if not w2020.empty and w2020['Bandel'].sum() > 0:
    avg_waste_per_bandel = (w2020['Waste'].sum()) / (w2020['Bandel'].sum())
else:
    avg_waste_per_bandel = 0

avg_waste_per_bandel


0.0034908593627899706

In [15]:
import zlib
import zipfile
import joblib

df = df.head(100)
df.to_csv("df.csv", header=False)
Sales.to_csv('Sales.csv',header=False)
Customers.to_csv('Customers.csv',header=False)
Industries.to_csv('Industries.csv',header=False)
Provinces.to_csv('Provinces.csv',header=False)
Rw.to_csv('Rw.csv',header=False)
Indfilter.to_csv('Indfilter.csv',header=False)
w2020.to_csv('w2020.csv', header=False)

def compress(file_names):
    print("File Paths:")
    print(file_names)
    compression = zipfile.ZIP_DEFLATED
    with zipfile.ZipFile("result.zip", mode="w") as zf:
        for file_name in file_names:
            zf.write('./' + file_name, file_name, compress_type=compression)

file_names = ["df.csv", "Sales.csv", "Customers.csv", "Industries.csv", "Provinces.csv", "Rw.csv", "Indfilter.csv", "w2020.csv"]
compress(file_names)

File Paths:
['df.csv', 'Sales.csv', 'Customers.csv', 'Industries.csv', 'Provinces.csv', 'Rw.csv', 'Indfilter.csv', 'w2020.csv']
