#  Consignes

## Description

Ouvrir le fichier ks-projects-201801.csv, il recense environ 100 000 projets KickStarter. Intégrer les données directement avec L'API Python dans une base de données Mongo. 

Il conviendra de bien spécifier manuellement l'ID du document. Pensez aussi à bien formatter le type des données pour profiter des méthodes implémentées par Mongo. L'ensemble de données n'est pas forcément nécessaire, c'est à vous de créer votre modèle de données.

## Questions

- 1) Récupérer les 5 projets ayant reçu le plus de promesse de dons.
- 2) Compter le nombre de projets ayant atteint leur but.
- 3) Compter le nombre de projets pour chaque catégorie.
- 4) Compter le nombre de projets français ayant été instanciés avant 2016.
- 5) Récupérer les projets américains ayant demandé plus de 200 000 dollars.
- 6) Compter le nombre de projet ayant "Sport" dans leur nom

In [19]:
import pandas as pd
import pymongo

In [20]:
client = pymongo.MongoClient('mongodb://localhost:27017/')
database = client['exercices']
collection = database['kickstarter']

In [21]:
df_ks = pd.read_csv("./data/ks-projects-201801-sample.csv")
df_ks.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real
0,872782264,"Scott Cooper's Solo CD ""A Leg Trick"" (Canceled)",Rock,Music,USD,2011-09-16,2000,2011-08-17 06:31:31,1145,canceled,24,US,1145.0,1145.0
1,1326492673,Ohceola jewelry,Fashion,Fashion,USD,2012-08-22,18000,2012-07-23 20:46:48,1851,failed,28,US,1851.0,1851.0
2,1688410639,Sluff Off & Harald: Two latest EGGs are Classi...,Tabletop Games,Games,USD,2016-07-19,2000,2016-07-01 21:55:54,7534,successful,254,US,3796.0,7534.0
3,156812982,SketchPlanner: Create and Plan- all in one bea...,Art Books,Publishing,USD,2017-09-27,13000,2017-08-28 15:47:02,16298,successful,367,US,2670.0,16298.0
4,1835968190,Proven sales with custom motorcycle accessories,Sculpture,Art,CAD,2016-02-24,5000,2016-01-25 17:37:10,1,failed,1,CA,0.708148,0.738225


Ce warning intervient lorsque pandas n'arrive pas à inférer le type de données. Il est sympa il précise les colones 6,8,10,12. 

In [22]:
df_ks.columns[[6,8,10,12]]

Index(['goal', 'pledged', 'backers', 'usd pledged'], dtype='object')

## Question 0

### Netoyer les données

In [23]:
df_ks['launched']=pd.to_datetime(df_ks['launched'],errors='coerce')
df_ks=df_ks.dropna(subset=['launched'])

In [24]:
df_ks=df_ks.rename(columns={'ID':'_id'})
df_ks.dtypes

_id                          int64
name                        object
category                    object
main_category               object
currency                    object
deadline                    object
goal                        object
launched            datetime64[ns]
pledged                     object
state                       object
backers                     object
country                     object
usd pledged                 object
usd_pledged_real           float64
dtype: object

In [25]:
collection.insert_many(df_ks.to_dict('records'))
collection.find_one()

{'_id': 872782264,
 'name': 'Scott Cooper\'s Solo CD "A Leg Trick" (Canceled)',
 'category': 'Rock',
 'main_category': 'Music',
 'currency': 'USD',
 'deadline': '2011-09-16',
 'goal': 2000.0,
 'launched': datetime.datetime(2011, 8, 17, 6, 31, 31),
 'pledged': 1145.0,
 'state': 'canceled',
 'backers': 24,
 'country': 'US',
 'usd pledged': 1145.0,
 'usd_pledged_real': 1145.0}

### Importer les données

## Question 1  

In [26]:
high_goal=collection.find().sort([('goal',-1)]).limit(5)
list(high_goal)

[{'_id': 1693637411,
  'name': "GTA5 Devin Westin's house in real life (Canceled)",
  'category': 'Design',
  'main_category': 'Design',
  'currency': 'EUR',
  'deadline': '2015-03-21',
  'goal': '999999.0',
  'launched': datetime.datetime(2015, 1, 21, 0, 17, 43),
  'pledged': '0.0',
  'state': 'canceled',
  'backers': '0',
  'country': 'IE',
  'usd pledged': '0.0',
  'usd_pledged_real': 0.0},
 {'_id': 554888187,
  'name': "'Laborer App' (Canceled)",
  'category': 'Interactive Design',
  'main_category': 'Design',
  'currency': 'USD',
  'deadline': '2015-04-12',
  'goal': '999999.0',
  'launched': datetime.datetime(2015, 3, 13, 10, 59, 49),
  'pledged': '11.0',
  'state': 'canceled',
  'backers': '10',
  'country': 'US',
  'usd pledged': '11.0',
  'usd_pledged_real': 11.0},
 {'_id': 944950980,
  'name': 'Clouday: The Cheapest Cloud Hosting in the World',
  'category': 'Technology',
  'main_category': 'Technology',
  'currency': 'GBP',
  'deadline': '2014-09-20',
  'goal': '99999.0',
  

## Question 2

In [27]:
collection.find({'state':'successful'}).count()

  collection.find({'state':'successful'}).count()


53040

## Question 3

In [28]:
np=collection.aggregate([
    {'$group':{'_id':'$category','number_project':{'$sum':1}}}
])
list(np)

[{'_id': 'Food', 'number_project': 4612},
 {'_id': 'Community Gardens', 'number_project': 115},
 {'_id': 'Metal', 'number_project': 274},
 {'_id': 'Wearables', 'number_project': 508},
 {'_id': 'Art', 'number_project': 3358},
 {'_id': 'Indie Rock', 'number_project': 2192},
 {'_id': 'Kids', 'number_project': 109},
 {'_id': 'Taxidermy', 'number_project': 7},
 {'_id': 'Performances', 'number_project': 414},
 {'_id': 'Publishing', 'number_project': 2332},
 {'_id': 'Apparel', 'number_project': 2827},
 {'_id': 'Classical Music', 'number_project': 1064},
 {'_id': 'Graphic Design', 'number_project': 765},
 {'_id': 'Cookbooks', 'number_project': 217},
 {'_id': 'Bacon', 'number_project': 78},
 {'_id': 'Fashion', 'number_project': 3379},
 {'_id': 'Footwear', 'number_project': 379},
 {'_id': 'Knitting', 'number_project': 78},
 {'_id': 'Music Videos', 'number_project': 299},
 {'_id': 'Art Books', 'number_project': 1065},
 {'_id': 'Faith', 'number_project': 439},
 {'_id': 'Conceptual Art', 'number_pr

## Question 4

In [29]:
import datetime
standard=datetime.datetime(2016,0o1,0o1,0o1)
#french=collection.find({"launched":{"$lte": standard}})
nb_projectfr=collection.aggregate([
    {'$match':{'country':'FR','launched':{'$lte':standard}}},
    {'$group':{'_id':'$name','date':{'$max':'$launched'}}}
])
len(list(nb_projectfr))

330

## Question 5

In [30]:
america = collection.find({"$and":[{"goal":{"$gte": 200000}}, {"country":"US"}]})
list(america)

[{'_id': 655043686,
  'name': 'Far from Par is a movie about a man and a talking golf ball.',
  'category': 'Comedy',
  'main_category': 'Film & Video',
  'currency': 'USD',
  'deadline': '2014-12-05',
  'goal': 200000.0,
  'launched': datetime.datetime(2014, 10, 6, 21, 20, 6),
  'pledged': 10.0,
  'state': 'failed',
  'backers': 2,
  'country': 'US',
  'usd pledged': 10.0,
  'usd_pledged_real': 10.0},
 {'_id': 866634482,
  'name': 'A CALL TO ADVENTURE',
  'category': 'Film & Video',
  'main_category': 'Film & Video',
  'currency': 'USD',
  'deadline': '2012-09-14',
  'goal': 287000.0,
  'launched': datetime.datetime(2012, 8, 13, 23, 14, 2),
  'pledged': 1465.0,
  'state': 'failed',
  'backers': 11,
  'country': 'US',
  'usd pledged': 1465.0,
  'usd_pledged_real': 1465.0},
 {'_id': 993194166,
  'name': 'Storybricks, the storytelling online RPG',
  'category': 'Video Games',
  'main_category': 'Games',
  'currency': 'USD',
  'deadline': '2012-06-01',
  'goal': 250000.0,
  'launched': da

## Question 6 

In [31]:
sport=collection.find({'name':{'$regex':'.*Sport.*'}})
list(sport)

[{'_id': 802281658,
  'name': 'Sportswear range',
  'category': 'Apparel',
  'main_category': 'Fashion',
  'currency': 'AUD',
  'deadline': '2014-08-23',
  'goal': 25000.0,
  'launched': datetime.datetime(2014, 7, 24, 5, 14, 52),
  'pledged': 20.0,
  'state': 'failed',
  'backers': 1,
  'country': 'AU',
  'usd pledged': 18.7569048,
  'usd_pledged_real': 18.675880100849763},
 {'_id': 1838460041,
  'name': 'Mount Systems for Recreation Sports & Film (GoPro) Lighting',
  'category': 'Gadgets',
  'main_category': 'Technology',
  'currency': 'USD',
  'deadline': '2015-06-16',
  'goal': 30000.0,
  'launched': datetime.datetime(2015, 5, 12, 18, 34, 28),
  'pledged': 12442.0,
  'state': 'failed',
  'backers': 11,
  'country': 'US',
  'usd pledged': 12442.0,
  'usd_pledged_real': 12442.0},
 {'_id': 767518055,
  'name': 'E-GoBox: Revolutionary Sports Capsule Dispenser',
  'category': 'Product Design',
  'main_category': 'Design',
  'currency': 'USD',
  'deadline': '2017-08-09',
  'goal': 60000.0