# 1차 프로젝트 : Adidas 미국 판매데이터 분석-파트 1

## (1) 데이터 로드 및 확인

In [12]:
import pandas as pd
import numpy as np

adidas  = pd.read_csv('./data/Adidas US Sales Datasets.csv')
adidas.head(2)

Unnamed: 0,Retailer,Retailer ID,Invoice Date,Region,State,City,Product,Price per Unit,Units Sold,Total Sales,Operating Profit,Operating Margin,Sales Method
0,Foot Locker,1185732,2020.1.1,Northeast,New York,New York,Men's Street Footwear,$50.00,1200,"$600,000","$300,000",50%,In-store
1,Foot Locker,1185732,2020.1.2,Northeast,New York,New York,Men's Athletic Footwear,$50.00,1000,"$500,000","$150,000",30%,In-store


In [13]:
adidas['Sales Method'].unique()

array(['In-store', 'Outlet', 'Online'], dtype=object)

Retailer : adidas 브랜드를 판매하는 유통처

Retailer ID : 판매점 ID

Price per Unit : 상품 1개당 가격

Units Sold : 판매수량

Total Sales	: 판매금액

Operating Profit : 영업이익(매출액 - 원가)

Operating Margin : 영업마진(영업이익율)


In [14]:
adidas.info() # 데이터타입 조회

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9648 entries, 0 to 9647
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Retailer          9648 non-null   object
 1   Retailer ID       9648 non-null   int64 
 2   Invoice Date      9648 non-null   object
 3   Region            9648 non-null   object
 4   State             9648 non-null   object
 5   City              9648 non-null   object
 6   Product           9648 non-null   object
 7   Price per Unit    9648 non-null   object
 8   Units Sold        9648 non-null   object
 9   Total Sales       9648 non-null   object
 10  Operating Profit  9648 non-null   object
 11  Operating Margin  9648 non-null   object
 12  Sales Method      9648 non-null   object
dtypes: int64(1), object(12)
memory usage: 980.0+ KB


## (2) 데이터 전처리

데이터 타입 변환

Retailer,Region,State,City,Product,Sales Method / object

Retailer ID int64 -> 문자

Invoice Date object -> 날짜

Price per Unit object -> 숫자(실수, float)

Units Sold -> 숫자(실수, float)

Total Sales -> 숫자(실수, float)

Operating Profit -> 숫자(실수, float) 

Operating Margin -> 숫자(실수, float) 


## 1) 숫자가 문자로 나타난 경우 문자를 숫자로 변환

In [27]:
# 컬럼별 데이터 변환

adidas['Price per Unit'] = adidas['Price per Unit'].str.replace('[\$,]', '', regex=True).str.strip().astype('float')
adidas['Units Sold'] = adidas['Units Sold'].str.replace('[\$,]', '', regex=True).str.strip().astype('float')
adidas['Total Sales'] = adidas['Total Sales'].str.replace('[\$,]', '', regex=True).str.strip().astype('float')
adidas['Operating Profit'] = adidas['Operating Profit'].str.replace('[\$,]', '', regex=True).str.strip().astype('float')
adidas['Operating Margin'] = adidas['Operating Margin'].str.replace('[\$,%]', '', regex=True).str.strip().astype('float')

  adidas['Price per Unit'] = adidas['Price per Unit'].str.replace('[\$,]', '', regex=True).str.strip().astype('float')
  adidas['Units Sold'] = adidas['Units Sold'].str.replace('[\$,]', '', regex=True).str.strip().astype('float')
  adidas['Total Sales'] = adidas['Total Sales'].str.replace('[\$,]', '', regex=True).str.strip().astype('float')
  adidas['Operating Profit'] = adidas['Operating Profit'].str.replace('[\$,]', '', regex=True).str.strip().astype('float')
  adidas['Operating Margin'] = adidas['Operating Margin'].str.replace('[\$,%]', '', regex=True).str.strip().astype('float')


In [28]:
adidas.head(5)

Unnamed: 0,Retailer,Retailer ID,Invoice Date,Region,State,City,Product,Price per Unit,Units Sold,Total Sales,Operating Profit,Operating Margin,Sales Method
0,Foot Locker,1185732,2020.1.1,Northeast,New York,New York,Men's Street Footwear,50.0,1200.0,600000.0,300000.0,50.0,In-store
1,Foot Locker,1185732,2020.1.2,Northeast,New York,New York,Men's Athletic Footwear,50.0,1000.0,500000.0,150000.0,30.0,In-store
2,Foot Locker,1185732,2020.1.3,Northeast,New York,New York,Women's Street Footwear,40.0,1000.0,400000.0,140000.0,35.0,In-store
3,Foot Locker,1185732,2020.1.4,Northeast,New York,New York,Women's Athletic Footwear,45.0,850.0,382500.0,133875.0,35.0,In-store
4,Foot Locker,1185732,2020.1.5,Northeast,New York,New York,Men's Apparel,60.0,900.0,540000.0,162000.0,30.0,In-store


Operating Margin은 백분율이기 때문에 0.01을 곱하여 계산에 사용할 수 있도록 새로운 변수 생성

In [29]:
adidas['Operating_Margin_rate'] = adidas['Operating Margin']*0.01
adidas.head()

Unnamed: 0,Retailer,Retailer ID,Invoice Date,Region,State,City,Product,Price per Unit,Units Sold,Total Sales,Operating Profit,Operating Margin,Sales Method,Operating_Margin_rate
0,Foot Locker,1185732,2020.1.1,Northeast,New York,New York,Men's Street Footwear,50.0,1200.0,600000.0,300000.0,50.0,In-store,0.5
1,Foot Locker,1185732,2020.1.2,Northeast,New York,New York,Men's Athletic Footwear,50.0,1000.0,500000.0,150000.0,30.0,In-store,0.3
2,Foot Locker,1185732,2020.1.3,Northeast,New York,New York,Women's Street Footwear,40.0,1000.0,400000.0,140000.0,35.0,In-store,0.35
3,Foot Locker,1185732,2020.1.4,Northeast,New York,New York,Women's Athletic Footwear,45.0,850.0,382500.0,133875.0,35.0,In-store,0.35
4,Foot Locker,1185732,2020.1.5,Northeast,New York,New York,Men's Apparel,60.0,900.0,540000.0,162000.0,30.0,In-store,0.3


## 2) Pandas 시간 데이터 다루기
YYYY.mm.dd 형태의 문자 데이터를 pandas의 datetime으로 변환

'%Y.%m.%d'

In [None]:
adidas['Invoice Date'] = pd.to_datetime(adidas['Invoice Date'], format='%Y.%m.%d')