In [4]:
import pandas as pd
import numpy as np

# Business Understanding - Description of the problem and background

Cameo is a website where people can pay celebrities to record a personalized video that's roughly a minute long. Many celebrities have signed up for Cameo from sports players to reality tv personalities to various cast members from the Office. One of the most challenging things for the celebrities with Cameo (besides keeping their enegry up to record 10 birthday videos back-to-back) figuring out what price to charge of their videos. 



# Data - description of the data and how it will be used to solve the problem

For this project, I will be using scraped data from two sources:

1. Cameo.com -- The website where you can purchase videos from celebrities 
2. FameFlux.com -- A website that ranks celebrities and companies by their online popularity

The Cameo data was scrapped from Janurary 21st 2021 and contains the information below, including the name of the celebrity, the cameo video price and the star rating for the celebrities. 

Fame Flux includes all the data below that begins with FF including the full name of the celebrity, the celebrity's celebrity rating. It was also gathered on Janurary 21st 2021. 

Link to article where I found the source: https://www.netcredit.com/blog/cost-effective-celebs-on-cameo/

I'll use this data to figure out how Fame Flux data (the independent variables) can be used to determing the price of cameos (dependent variable)

In [5]:
cameo = pd.read_csv('cameodata.csv')
cameo.head()

Unnamed: 0,Full name,Category,Link,Profile keywords,Full bio,Star rating,Number of reviews,Cameo video price,Cameo chat price,Response times (hours),Tags,FF Full name,FF Profile keywords,FF Celebrity rating,FF Main category rating,Wikipedia Traffic\n(Avg. daily),Cost Effectiveness METRIC,Cost Effectiveness RANK
0,Marla Maples,Actors,https://www.cameo.com/marla,Actress,For endorsement & speaking inquiries: tom@lond...,4.98,83,72.0,9.99,76.43,"#Actors, #More, #Motivational Speakers, #Music...",Marla Maples,Actor,202,90,8608,68.757,1
1,Gina Rodriguez,Public Figures,https://www.cameo.com/ginarodriguez,Reality TV Star,Celebrity Manager on Mama June From Not To Hot...,5.0,1,20.0,2.99,5.09,#Reality TV,Gina Rodriguez,Public Figure,1524,177,3526,32.808,2
2,Nancy Kerrigan,Athletes,https://www.cameo.com/nancyk,Former Pro Figure Skater - Olympian - Actress,,5.0,122,50.0,2.99,56.47,"#Athletes, #Olympics, #Winter Sports, #Women i...",Nancy Kerrigan,Figure,948,2,1816,21.097,3
3,LaVar Ball,Public Figures,https://www.cameo.com/lavarball,CEO of Big Baller Brand,,4.95,176,133.0,2.99,1.01,"#Athletes, #Basketball, #Entrepreneurs, #Featu...",LaVar Ball,Public Figure,369,27,1558,20.376,4
4,Ice Cube,Actors,https://www.cameo.com/donmega69,Rapper,,4.94,96,500.0,2.99,157.43,"#Actors, #Comedy, #Featured, #Hip Hop, #Movies...",Ice Cube,Actor,101,43,6460,19.802,5


## Dependent Variable Cost per Cameo - This is how much a customer must pay for the cameo

In [22]:
# pulling in the dependent variable and full name so there's a reference point 
dependent=cameo[['Full name', 'Cameo chat price']] 
dependent

Unnamed: 0,Full name,Cameo chat price
0,Marla Maples,9.99
1,Gina Rodriguez,2.99
2,Nancy Kerrigan,2.99
3,LaVar Ball,2.99
4,Ice Cube,2.99
...,...,...
5992,Emeke Egbule,2.99
5993,Sean Klitzner,19.99
5994,Kinsey Schofield,2.99
5995,Liz Katz,2.99


### Description of the dependent variables 

Cameo chat price - is the price a customer has to pay for the ~1minute video from the celebrity as described by the 'Full name' column

## Independent Variable - Fame Flux data -- these are the fields that will be used to determine the price per Cameo

In [21]:
### showing it with the name of the person so we can see how it's variable by the person. 
independent =cameo[['FF Full name', 'FF Celebrity rating', 'FF Main category rating', 'Wikipedia Traffic\n(Avg. daily)']] 
independent

Unnamed: 0,FF Full name,FF Celebrity rating,FF Main category rating,Wikipedia Traffic\n(Avg. daily)
0,Marla Maples,202,90,8608
1,Gina Rodriguez,1524,177,3526
2,Nancy Kerrigan,948,2,1816
3,LaVar Ball,369,27,1558
4,Ice Cube,101,43,6460
...,...,...,...,...
5992,Emeke Egbule,910416,79333,19
5993,Sean Klitzner,230115,7349,22
5994,Kinsey Schofield,250578,38900,7
5995,Liz Katz,910416,79333,381


## Independent Variables

- FF Full name - is the full name from the Fame Flux website 
- FF Celebrity rating - is the ranking of the celebrities compared to another. A ranking of 1 would be the most famous of celebrities. The highest number means the highest the less famous the celebrities is. 
- FF Main category rating - is the rating of the celebrity within their category. For example, if the person is a muscian then they might have a general ranking (FF Celebrity ranking) of 202, but a muscian category ranking (FF Main category rating) of 90
- Wikipedia Traffic\n(Avg. daily) - is the daily number of visits that go to the celebrities wikipeida page per a day on average. 

All of these indicate how famous a particular person is to the general population. Each of these numeric values will be used to determine the p