# JillyAtlanta Sales Analysis
Dollars per PDF over time
***

### Gather and Preprocess Data <a class="anchor" id="gather_pp"></a>

Data comes from two sources, Etsy (www.etsy.com/market/jillyatlanta) and Shopify (https://jillyatlanta.com).

#### Etsy Dataframe <a class="anchor" id="etsy_df"></a>

Etsy reports separate the order information - importantly including refunds - and item details, which indicate whether a sale is a PDF sale or not.

1. Get reports indicating whether an item is a PDF, and make a dictionary where key, value is order_id, True for PDFs.  
2. Get reports with sales amount information and add column for PDF status.

In [1]:
import pandas as pd
import numpy as np
import glob

Get data that includes sales and refund amounts with order_id <a class="anchor" id="order_refund_df"></a>

In [2]:
fiels = glob.glob('./etsy/EtsySoldOrderItems/*.csv')
df = pd.DataFrame()
for fiel in fiels:
    dfi = pd.read_csv(fiel)
    df = pd.concat([df, dfi])
#df.head()

In [3]:
etsyitems = df[['Sale Date',
           'Order ID', 
           'Item Name']]
etsyitems.columns = ['date', 'order_id', 'item_name']
#print(len(etsyitems))
etsyitems.head()

Unnamed: 0,date,order_id,item_name
0,12/29/14,180720853,"Girl Holiday Skirt,sequined girl skirt,new yea..."
1,12/14/14,178061395,"Baby bonnet, lace bonnet, toddler bonnet, bapt..."
2,12/11/14,177033295,"Girl Sequin Skirt, holiday clothing, tulle hol..."
3,12/07/14,175486889,"Baby bonnet, lace bonnet, toddler bonnet, bapt..."
4,12/06/14,174935168,"Girl Skirt Pattern, PDF sewing pattern,easy pa..."


In [4]:
order_ids = list(etsyitems.order_id.unique())
print(len(order_ids))

12278


In [5]:
order_ids_nodups = etsyitems.drop_duplicates(subset=['order_id'], 
                                             keep='first', 
                                             inplace=False)

In [6]:
order_ids_nodups.head()

Unnamed: 0,date,order_id,item_name
0,12/29/14,180720853,"Girl Holiday Skirt,sequined girl skirt,new yea..."
1,12/14/14,178061395,"Baby bonnet, lace bonnet, toddler bonnet, bapt..."
2,12/11/14,177033295,"Girl Sequin Skirt, holiday clothing, tulle hol..."
3,12/07/14,175486889,"Baby bonnet, lace bonnet, toddler bonnet, bapt..."
4,12/06/14,174935168,"Girl Skirt Pattern, PDF sewing pattern,easy pa..."


In [7]:
order_ids, item_names = order_ids_nodups.order_id.values, order_ids_nodups.item_name.values

In [8]:
pdf_orders, pdf_names = [], []
for i in range(len(item_names)):
    if 'pdf'.casefold() in item_names[i].casefold(): 
        pdf_orders.append(order_ids[i])
        if item_names[i] not in pdf_names:
            pdf_names.append(item_names[i])
        #print(len(pdf_orders), len(pdf_names))
        #print(pdf_names[-1])
#print(len(pdf_names))

In [21]:
def ymd(date):
    x = date.split('/')
    y = '20' + x[2]
    m = x[0]
    d = x[1]
    return y + '-' + m + '-' + d

In [22]:
order_ids_nodups['Y-m-d'] = order_ids_nodups.date.apply(ymd)
order_ids_nodups.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  order_ids_nodups['Y-m-d'] = order_ids_nodups.date.apply(ymd)


Unnamed: 0,date,order_id,item_name,Y-m-d
0,12/29/14,180720853,"Girl Holiday Skirt,sequined girl skirt,new yea...",2014-12-29
1,12/14/14,178061395,"Baby bonnet, lace bonnet, toddler bonnet, bapt...",2014-12-14
2,12/11/14,177033295,"Girl Sequin Skirt, holiday clothing, tulle hol...",2014-12-11
3,12/07/14,175486889,"Baby bonnet, lace bonnet, toddler bonnet, bapt...",2014-12-07
4,12/06/14,174935168,"Girl Skirt Pattern, PDF sewing pattern,easy pa...",2014-12-06


In [23]:
df10 = order_ids_nodups

In [24]:
df10.sort_values('Y-m-d')

Unnamed: 0,date,order_id,item_name,Y-m-d
23,06/16/14,144656004,Garden Skirt,2014-06-16
21,08/30/14,155967399,Custom order--Size 12/14,2014-08-30
20,08/31/14,156150920,"Reserved Listing: girls shorts, organic chamb...",2014-08-31
18,09/02/14,156470823,"Reserved Listing: organic cotton twill, pink s...",2014-09-02
17,09/26/14,160466268,"Reserved Listing: Girl Organic Shorts, cotton ...",2014-09-26
...,...,...,...,...
4,01/28/22,2360732074,"Marlow Add-on PDF, girl top pattern, girl pdf ...",2022-01-28
3,01/28/22,2360759003,"Angelica PDF, dress pattern, girl pattern, sew...",2022-01-28
0,01/29/22,2361635562,"Louisa Pinafore PDF, girl pinafore pdf, girl p...",2022-01-29
2,01/29/22,2361213588,"Melbourne Romper PDF, boy romper pattern, girl...",2022-01-29


In [37]:
itemi, datesi = [], []
items = df10['item_name'].values
dates = df10['Y-m-d'].values
for i in range(len(items)):
    if items[i] not in itemi:
        itemi.append(items[i])
        datesi.append(dates[i])
print(len(datesi))

218


In [38]:
for item in itemi:
    print(item)

Girl Holiday Skirt,sequined girl skirt,new years skirt,tween gold skirt,toddler holiday,special occasion,size 2T, 3T, 4T, 5, 6, 8, 10, 12
Baby bonnet, lace bonnet, toddler bonnet, baptism bonnet, off white baby hat, 1-3 mo, 3-6 mo, 6-9 mo, 9-12 mo, 12-18 mo, 18-24 mo sizes
Girl Sequin Skirt, holiday clothing, tulle holiday skirt, special occasion, sequin skirt girl, children wedding,4T, 5, 6, 7, 8, 10, 12 sizes
Girl Skirt Pattern, PDF sewing pattern,easy pattern girls,pdf girls pattern,girls skirt pdf,little girls pattern,girls pdf sewing,pdf sewing
Girl Holiday Skirt,sequined girl skirt,knee length skirt,tween gold skirt,toddler holiday,special occasion,size 2T, 3T, 4T, 5, 6, 8, 10, 12
Girl Tutu, girl tulle skirt, ballerina skirt, ballet girl clothing, children dance skirt, birthday skirt, size 2T, 3T, 4T, 5, 6, 8, 10
Girl Holiday Skirt, turquoise skirt, blue sequined skirt, tulle party skirt, hannukah skirt, special occasion, 4T, 5, 6, 7, 8, 10, 12, 14
Girl Blouse,girl polka dot top,