In [1]:
import pandas as pd

# Sales Data

In this assignment, we'll be working with data from a fictitious sales division. Take a look at the data below.

In [2]:
sales = pd.read_csv('sales.csv')
sales

Unnamed: 0,Order Id,Company Id,Company Name,Date,Val,Sale,Sales Rep,Sales Rep Id
0,HZSXLI1IS9RGABZW,D0AUXPP07H6AVSGD,Melancholy Social-Role,2017-10-13,6952,0,William Taylor,ZTZA0ZLYZR85PTUJ
1,582WPS3OW8T6YT0R,D0AUXPP07H6AVSGD,Melancholy Social-Role,2017-09-02,7930,0,William Taylor,ZTZA0ZLYZR85PTUJ
2,KRF65MQZBOYG4Y9T,D0AUXPP07H6AVSGD,Melancholy Social-Role,2016-12-21,5538,1,William Taylor,ZTZA0ZLYZR85PTUJ
3,N3EDZ5V1WGSWW828,D0AUXPP07H6AVSGD,Melancholy Social-Role,2018-06-03,1113,0,William Taylor,ZTZA0ZLYZR85PTUJ
4,QXBC8COXEXGFSPLP,D0AUXPP07H6AVSGD,Melancholy Social-Role,2014-07-26,4596,0,William Taylor,ZTZA0ZLYZR85PTUJ
...,...,...,...,...,...,...,...,...
99995,HKZFX556ZQRZJZWR,APH243SK72T90MPS,Trade-Preparatory Quarterbacks,2017-11-06,7516,0,Ida Woodward,LF3CPWWZKSNB1AXI
99996,962CSDMAJ49E0CRK,APH243SK72T90MPS,Trade-Preparatory Quarterbacks,2018-08-02,442,1,Ida Woodward,LF3CPWWZKSNB1AXI
99997,ZW7RO9TLL6EVVJEC,APH243SK72T90MPS,Trade-Preparatory Quarterbacks,2014-11-02,8544,0,Ida Woodward,LF3CPWWZKSNB1AXI
99998,LNKGIWMZ9RT49IE9,APH243SK72T90MPS,Trade-Preparatory Quarterbacks,2017-04-01,6650,0,Ida Woodward,LF3CPWWZKSNB1AXI


### Important definitions:

- A sales **_lead_** refers to the data that identifies a potential buyer of a product or service.
- If a deal **_closes_**, that means that it resulted in a purchase; in other words, a _closed_ deal is a successful one.

Each row of this data represents a single lead, and includes information on – among other things – the sales rep involved (`Sales Rep`), the name of the company that the deal might close with (`Company Name`), the value of the deal in USD (`Val`), the date of the lead (`Date`), and whether or not the deal closed (`Sale`).

Use this data to answer the following questions.

### _Tips for solving these problems_

_The primary objective of this assignment is for you to learn the power of GroupBy objects. I recommend that for each problem, you stop at intermediate points during the solving process to simply print out what you have up to that point._

_For example, in #1 below, start by grouping the data by company; then, before moving on, check out your GroupBy object by printing out one or more of the groups._

_Next, move on to finding the size of each group – this is easier than you may think it is! The whole point of a GroupBy object is to make it easier to operate on multiple "sub-DataFrames" at a time. If you can figure out how to find the size of a single DataFrame, you can pretty much expect it to work the same for a group of sub-DataFrames. Most functions that work for DataFrames simply get broadcast on a group-by-group basis when you apply them to a GroupBy object. If you're not sure whether your applied function worked or not, print your results and manually run them by your intuitions and expectations._

_Moving on in the problem, you'll have to look up how to sort groups in descending order – this part should be more straightforward than the previous steps. After that, all you need to do is display the first ten rows!_

# 1) Most Leads

Which companies had the highest number of leads?

List out the names of the top 10, along with their lead counts, in descending order (highest count first). If there are any ties, then the order between them does not matter.

_Hint: group the data by company, find the size of each group, sort the groups in descending order, and grab the first ten rows using the `head()` function. You'll need to search up how to sort in descending order, but that part should be pretty straightforward._

In [15]:
# Type your Problem 1 code here
companies = sales.groupby('Company Name')
# companies.size().sort_values(ascending=False).head(10)
companies.size().nlargest(n=10, keep='all')

Company Name
80Th Scorecard            41
Contemporary Gardenia     39
Shrill Co-Op              39
Smashed-Out Intercept     39
Cohnfidunt Beckman        37
Geometric Abbot           37
Accurate Kaplan           36
Mutational Fertilizer     36
Now-Famous Outcomes       36
Pedimented Bandish        36
Presidential Gus          36
Two-Fisted Interlining    36
Unadjusted Sun            36
dtype: int64

# 2) Most Closures

Which companies were the most frequent buyers? In other words, which companies had the most closures?

List out the names of the top 10, along with their closure counts, in descending order (highest count first). If there are any ties, then the order between them does not matter.

_Hint: query for all closed leads, group by company, compute the size of each group, sort in descending order, and grab the first ten rows._

In [19]:
# Type your Problem 2 code here
closed_leads = sales[sales['Sale'] == 1]
company_closures = closed_leads.groupby('Company Name')
company_closures.size().nlargest(n=10, keep='all')

Company Name
Canny Pompousness                     16
Particularistic-Seeming Frog-Haiku    14
Eighty-Fifth Aviv                     13
Minor Targets                         11
Sickly Phosphate                      11
Approachable Sturch                   10
Faceless Misgivings                   10
Gullible Referral                     10
Hydrostatic Amazement                 10
Life-Death Bucer                      10
Loose-Jointed Editors                 10
Now-Misplaced Saviour                 10
Oleophilic Intestines                 10
Out-Of-Bounds Pacers                  10
Rental Manual                         10
Shrill Co-Op                          10
Stiff Clothes                         10
Surreptitious Owi                     10
Three-Night Hens'                     10
Unchallenged Coachman                 10
Ungodly Commentary                    10
Unreasoning Thills                    10
Well-Armed Horse-Trail                10
dtype: int64

# 3) Most Successful

Which companies were the most successful ventures? For the purposes of this question, let "success" refer to the ratio of closures to leads. A company with a higher closure-to-lead ratio is a more successful venture.

List out the names of the top 10, along with their success ratios, in descending order (highest ratio first). If there are any ties, then the order between them does not matter.

_Hint: divide #2 by #1, sort in descending order, and grab the first ten rows._

In [None]:
# Type your Problem 3 code here

# 4) Totals

What is the total number of deals that this fictitious company has managed to close? What is the total profit from those deals?

_Hint: query for all closed leads, and apply the `sum` function to the appropriate rows to find the total number of deals and their total profit._

In [None]:
# Type your Problem 4 code here