# Returns Data Exploratory Data Analysis

## Table of Contents:
* [Goal](#goal)
* [Dataset](#dataset)
    * [Importing the Libraries](#import)
    * [Reading and Viewing the Dataset](#reading)
    * [Renaming Columns](#rename)
    * [Dropping Redundant Columns](#drop)
    * [Detecting Missing Values](#missing)
    * [Dropping Duplicate Values](#duplicates)
    * [Preprocessing Rows and Columns](#preprocessing)
* [Conclusion](#conc)

***

## Goal <a class="anchor" id="goal"></a>

We are going to be performing Exploratory Data Analysis on the Returns Data to determine characteristics and have a better understanding of the said data.

***

## Dataset <a class="anchor" id="dataset"></a>

### Importing Necessary Libraries<a class="anchor" id="import"></a>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")
warnings.warn("thisnwill not show")

### Reading and Viewing the Dataset <a class="anchor" id="reading"></a>

In [None]:
df = pd.read_excel("Iadeler listesi.xlsx")

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15880 entries, 0 to 15879
Data columns (total 19 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Type                 7940 non-null   object 
 1   nOrderId             7940 non-null   float64
 2   cPostCode            7939 non-null   object 
 3   Customer ID          7940 non-null   object 
 4   ItemNumber           7938 non-null   object 
 5   ItemTitle            6276 non-null   object 
 6   dReceievedDate       7940 non-null   object 
 7   cCountry             7940 non-null   object 
 8   cCountryCode         7940 non-null   object 
 9   cCurrency            7940 non-null   object 
 10  source               7940 non-null   object 
 11  subsource            7932 non-null   object 
 12  Return Date          5102 non-null   object 
 13  ReturnQty            7940 non-null   float64
 14  Category             2593 non-null   object 
 15  ResendOrExchangeQty  7940 non-null  

In [None]:
df.head()                                                      # viewing the first 5 rows

Unnamed: 0,Type,nOrderId,cPostCode,Customer ID,ItemNumber,ItemTitle,dReceievedDate,cCountry,cCountryCode,cCurrency,source,subsource,Return Date,ReturnQty,Category,ResendOrExchangeQty,RMA Actioned,Refund Amount,Return Reason
0,,,,,,,,,,,,,,,,,,,
1,EXCHANGE,123242.0,CF11 0JE,C0012493,49215,Caimas 2980 120x170,2021-04-08 18:52:00,United Kingdom,GB,GBP,WOOCOMMERCE,https://www.the-rugs.com,19/08/2021 15:22,1.0,Size issue,1.0,1.0,0.0,small
2,,,,,,,,,,,,,,,,,,,
3,EXCHANGE,124273.0,HX4 9HP,C0000946,49035-pr,Montana 3720 brown 160X230,25/08/2021 11:19,United Kingdom,GB,GBP,WOOCOMMERCE,https://www.the-rugs.com,2021-02-09 17:13:00,1.0,Item is fine,1.0,1.0,0.0,
4,,,,,,,,,,,,,,,,,,,


In [None]:
df.shape                                                       # viewing the shape

(15880, 19)

### Renaming Columns<a class="anchor" id="rename"></a>

In [None]:
df.rename(columns={"nOrderId":"Order_id",
                   "Customer ID":"CustomerID",
                   "cPostCode":"PostCode",
                   "dReceievedDate":"ReceivedDate",
                   "cCountryCode":"CountryCode",
                   "cCountry" : "Country", 
                   "cCurrency" : "Currency",
                   'Return Date' : 'ReturnDate',
                   'Return Reason' : 'ReturnReason',
                   'RMA Actioned': 'RMAActioned',
                   'Refund Amount':'RefundAmount',
                   }, inplace=True)                           # renaming the columns

### Dropping Redundant Columns<a class="anchor" id="drop"></a>

In [None]:
df = df.drop(["CountryCode",
              'Currency',
              'ResendOrExchangeQty',
              'RMAActioned',
              'subsource'], axis = 1)
                                                              # dropping redundant columns

### Detecting Missing Values <a class="anchor" id="missing"></a>

In [None]:
df.isnull().sum()                                             # checking the missing values

Type             7940
Order_id         7940
PostCode         7941
CustomerID       7940
ItemNumber       7942
ItemTitle        9604
ReceivedDate     7940
Country          7940
source           7940
ReturnDate      10778
ReturnQty        7940
Category        13287
RefundAmount     7940
ReturnReason    13376
dtype: int64

### Dropping Duplicates <a class="anchor" id="duplicates"></a>

In [None]:
df.drop_duplicates(inplace=True)                            # dropping duplicates         

In [None]:
df.drop(df.index[0], inplace=True)                          # dropping the original nan row

In [None]:
df.reset_index(drop=True, inplace=True)

### Preprocessing Rows and Columns <a class="anchor" id="preprocessing"></a>

In [None]:
df.ReturnReason.fillna('No info', inplace=True)                       
                  # filling missing values in ReturnReason column with No info since it's basically the same thing

In [None]:
df.ReturnReason = df.ReturnReason.replace(['no info', ' no info', 'NO INFO',
                                           '..', 1231, 'no reason','no idea', 'NO Ä°NFO','nn'],
                                          ['No info', 'No info', 'No info',
                                           'No info', 'No info', 'No info', 
                                           'No info', 'No info', 'No info'])
                  # changing all the other variations of no info to have unity

***

In [None]:
df_amazon_returns = df[df.CustomerID == 'C0000001']                    # seperating amazong returns as a new dataframe

In [None]:
df_other_returns = df[(df.CustomerID != "C0000001")]                   # seperating the rest of the returns as a new dataframe

In [None]:
df_amazon_returns.reset_index(inplace=True, drop=True)                 # resetting the index and dropping the previous one

***

In [None]:
df[df.ItemNumber.str.endswith('my') == True].ItemTitle.value_counts()  # cross-checking item numbers that ends with -my

CAIMAS 5400 160X230                13
CAIMAS 5000 120X170                12
CAIMAS 5200 120X170                11
CAIMAS 5400 120X170                11
CAIMAS 5000 160X230                11
CARINA 6921 Grey 160x230           10
CAIMAS 2990 160X230                10
CARINA 6921 Grey 120x170            8
CARINA 6900 Green 160x230           7
CAIMAS 2990 80X150                  7
CARINA 6901 Grey 80x150             6
CARINA 6901 Grey 120x170            6
CARINA 6922 Green 120x170           6
CAIMAS 5400 80X300                  6
CARINA 6920 Pink 120x170            5
CAIMAS 5400 180X270                 5
LARA 704 Cream 160X230              4
CARINA 6941 Pink 120x170            4
CARINA 6941 Pink 160x230            4
CARINA 6952 Dark grey 120x170       4
LARA 803 120X170                    4
CARINA 6900 Green 120x170           3
CAIMAS 2990 120X170                 3
CAIMAS 6100 120X170                 3
LARA 703 160X230                    3
CARINA 6930 Black 160x230           2
CARINA 6902 

In [None]:
df['ItemTitle'].fillna('unknown', inplace=True)        # labelling missing values in ItemTitle column as unknown

***

In [None]:
df['RugName'] = df['ItemTitle'].str.split().apply(lambda x: x[0]) 
                                                       # creating a column that consist of Rug names out of the ItemTitle column

In [None]:
df['RugName']                                          # displaying the RugName column

0         Caimas
1        Montana
2        unknown
3        unknown
4        unknown
          ...   
7933      Shaggy
7934     Rapsody
7935     Gustavo
7936    Marakech
7937    Marakech
Name: RugName, Length: 7938, dtype: object

***

In [None]:
df["ReceivedDate"] = pd.to_datetime(df["ReceivedDate"])  # changing ReceivedDate columns data type to datetime

In [None]:
df["ReturnDate"] = pd.to_datetime(df["ReturnDate"])      # changing ReturnDate columns data type to datetime

***

In [None]:
df.Category = df.Category.replace(['Untidy', 'wont fit in machine', 'Shiny',
                                   'Price issue', 'Quality/ size issue', 'Item fine', 'washing issue'],
                                  ['Untidy item', 'Size issue', 'Colour issue',
                                   'Quality issue', 'Quality issue', 'Item is fine', 'Quality issue'])

                                                       # narrowing down categories in the category column

In [None]:
df.Category = df.Category.replace(np.nan, 'No info')   # replacing missing values with the label "No info"

In [None]:
df.Category.value_counts(dropna=False)                 # displaying unique values and their counts in Category column

No info                            5898
Size issue                          364
Didnt like                          344
Colour issue                        317
Quality issue                       299
Delivery issue                      138
Wrong Item sent                      99
Accidentally ordered                 78
No need                              77
Defect                               71
Item is fine                         59
Thin                                 45
Colour/quality issue                 41
Not flat                             28
Defect/ quality issue                27
Ordered multiple to choose from      18
Untidy item                          18
Exchange                             17
Name: Category, dtype: int64

***

In [None]:
df.Type = df.Type.apply(lambda x: 'RETURN' if x == "RETURNBOOKING" else x)    # merging returnbooking and return

***

***

## Conclusion <a class="anchor" id="conc"></a>

We, as a team, went over every single feature throughtout our Exploratory Data Analysis. We cleaned, adjusted and organized the data. It is now ready for further analyses.

In [None]:
df

Unnamed: 0,Type,Order_id,PostCode,CustomerID,ItemNumber,ItemTitle,ReceivedDate,Country,source,ReturnDate,ReturnQty,Category,RefundAmount,ReturnReason,RugName
0,EXCHANGE,123242.0,CF11 0JE,C0012493,49215,Caimas 2980 120x170,2021-04-08 18:52:00,United Kingdom,WOOCOMMERCE,2021-08-19 15:22:00,1.0,Size issue,0.00,small,Caimas
1,EXCHANGE,124273.0,HX4 9HP,C0000946,49035-pr,Montana 3720 brown 160X230,2021-08-25 11:19:00,United Kingdom,WOOCOMMERCE,2021-02-09 17:13:00,1.0,Item is fine,0.00,No info,Montana
2,REFUND,100064.0,BT23 4PX,C0001051,153761726783-1958345997005,unknown,2019-12-21 13:45:00,United Kingdom,EBAY,NaT,0.0,No info,0.00,No info,unknown
3,REFUND,100557.0,KW1 4YL,C0001314,153525972948-1992856209005,unknown,2020-03-29 13:14:00,United Kingdom,EBAY,NaT,0.0,No info,0.00,No info,unknown
4,REFUND,100773.0,NG3 6PD,C0001313,153525972948-1997358422005,unknown,2020-08-04 11:57:00,United Kingdom,EBAY,NaT,0.0,No info,0.00,No info,unknown
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7933,RETURN,137836.0,wd187uh,C0030666,48162,Shaggy 381 turkis 140X200,2022-03-02 12:00:00,United Kingdom,EBAY,2022-09-02 12:22:00,1.0,Quality issue,0.00,Quality issue,Shaggy
7934,RETURN,138454.0,CV3 1NB,C0013520,2475,Rapsody 210 D-W 70x250,2022-08-02 16:36:00,United Kingdom,EBAY,2022-10-02 10:55:00,1.0,No info,74.90,No info,Rapsody
7935,RETURN,138459.0,WV1 3RN,C0005820,2067,Gustavo 3222 brown 200X290,2022-08-02 17:05:00,United Kingdom,EBAY,2022-10-02 17:26:00,1.0,No info,0.00,No info,Gustavo
7936,RETURN,138479.0,AL2 3LG,C0015078,2679,Marakech 430 120x170,2022-08-02 20:04:00,United Kingdom,EBAY,2022-10-02 14:55:00,1.0,Didnt like,47.41,didnt like,Marakech
