#  Consignes

## Description

Ouvrir le fichier ks-projects-201801.csv, il recense environ 100 000 projets KickStarter. Intégrer les données directement avec L'API Python dans une base de données Mongo. 

Il conviendra de bien spécifier manuellement l'ID du document. Pensez aussi à bien formatter le type des données pour profiter des méthodes implémentées par Mongo. L'ensemble de données n'est pas forcément nécessaire, c'est à vous de créer votre modèle de données.

## Questions

- 1) Récupérer les 5 projets ayant reçu le plus de promesse de dons.
- 2) Compter le nombre de projets ayant atteint leur but.
- 3) Compter le nombre de projets pour chaque catégorie.
- 4) Compter le nombre de projets français ayant été instanciés avant 2016.
- 5) Récupérer les projets américains ayant demandé plus de 200 000 dollars.
- 6) Compter le nombre de projet ayant "Sport" dans leur nom

In [1]:
import pandas as pd
import pymongo

In [2]:
client = pymongo.MongoClient("mongo")
database = client['exercices']
collection = database['kickstarter']

In [3]:
df_ks = pd.read_csv("./data/ks-projects-201801-sample.csv")
df_ks.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real
0,872782264,"Scott Cooper's Solo CD ""A Leg Trick"" (Canceled)",Rock,Music,USD,2011-09-16,2000,2011-08-17 06:31:31,1145,canceled,24,US,1145.0,1145.0
1,1326492673,Ohceola jewelry,Fashion,Fashion,USD,2012-08-22,18000,2012-07-23 20:46:48,1851,failed,28,US,1851.0,1851.0
2,1688410639,Sluff Off & Harald: Two latest EGGs are Classi...,Tabletop Games,Games,USD,2016-07-19,2000,2016-07-01 21:55:54,7534,successful,254,US,3796.0,7534.0
3,156812982,SketchPlanner: Create and Plan- all in one bea...,Art Books,Publishing,USD,2017-09-27,13000,2017-08-28 15:47:02,16298,successful,367,US,2670.0,16298.0
4,1835968190,Proven sales with custom motorcycle accessories,Sculpture,Art,CAD,2016-02-24,5000,2016-01-25 17:37:10,1,failed,1,CA,0.708148,0.738225


Ce warning intervient lorsque pandas n'arrive pas à inférer le type de données. Il est sympa il précise les colones 6,8,10,12. 

In [4]:
df_ks.columns[[6,8,10,12]]

Index(['goal', 'pledged', 'backers', 'usd pledged'], dtype='object')

## Question 0

### Netoyer les données

In [5]:
df_ks.dtypes

ID                    int64
name                 object
category             object
main_category        object
currency             object
deadline             object
goal                 object
launched             object
pledged              object
state                object
backers              object
country              object
usd pledged          object
usd_pledged_real    float64
dtype: object

In [6]:
pd.to_numeric(df_ks['goal'])

ValueError: Unable to parse string "2014-04-17" at position 66141

On remarque avec l'erreur si dessus qu'il y a un problème à la position 66141. A la place du goal, il y a des données décalées (ici sûrement la valeur deadline), on va le vérifier ci-dessous.

In [7]:
df_ks.iloc[66141,:]

ID                                        85964225
name                Debut Album from Michael Adam 
category                          Grace is Leaving
main_category                           Indie Rock
currency                                     Music
deadline                                       USD
goal                                    2014-04-17
launched                                     700.0
pledged                        2014-04-02 21:56:35
state                                        850.0
backers                                 successful
country                                         32
usd pledged                                     US
usd_pledged_real                               850
Name: 66141, dtype: object

On remarque bien que tout est décalé à cause du titre qui est sur 2 lignes ! On va donc supprimer la ligne car il nous manque la donnée usd_pleadged_real.

In [8]:
df_ks.drop(66141,0,inplace=True)

On teste alors la fonction précédente !

In [9]:
df_ks=df_ks.astype({"goal":"float64"})

In [10]:
df_ks.drop(['currency', 'deadline', 'pledged', 'backers', 'usd pledged'], axis='columns', inplace=True)
df_ks.head(5)

Unnamed: 0,ID,name,category,main_category,goal,launched,state,country,usd_pledged_real
0,872782264,"Scott Cooper's Solo CD ""A Leg Trick"" (Canceled)",Rock,Music,2000.0,2011-08-17 06:31:31,canceled,US,1145.0
1,1326492673,Ohceola jewelry,Fashion,Fashion,18000.0,2012-07-23 20:46:48,failed,US,1851.0
2,1688410639,Sluff Off & Harald: Two latest EGGs are Classi...,Tabletop Games,Games,2000.0,2016-07-01 21:55:54,successful,US,7534.0
3,156812982,SketchPlanner: Create and Plan- all in one bea...,Art Books,Publishing,13000.0,2017-08-28 15:47:02,successful,US,16298.0
4,1835968190,Proven sales with custom motorcycle accessories,Sculpture,Art,5000.0,2016-01-25 17:37:10,failed,CA,0.738225


### Importer les données

In [11]:
dictionnaire=df_ks.to_dict('record')



In [12]:
dictionnaire

[{'ID': 872782264,
  'name': 'Scott Cooper\'s Solo CD "A Leg Trick" (Canceled)',
  'category': 'Rock',
  'main_category': 'Music',
  'goal': 2000.0,
  'launched': '2011-08-17 06:31:31',
  'state': 'canceled',
  'country': 'US',
  'usd_pledged_real': 1145.0},
 {'ID': 1326492673,
  'name': 'Ohceola jewelry',
  'category': 'Fashion',
  'main_category': 'Fashion',
  'goal': 18000.0,
  'launched': '2012-07-23 20:46:48',
  'state': 'failed',
  'country': 'US',
  'usd_pledged_real': 1851.0},
 {'ID': 1688410639,
  'name': 'Sluff Off & Harald: Two latest EGGs are Classics "old & new"',
  'category': 'Tabletop Games',
  'main_category': 'Games',
  'goal': 2000.0,
  'launched': '2016-07-01 21:55:54',
  'state': 'successful',
  'country': 'US',
  'usd_pledged_real': 7534.0},
 {'ID': 156812982,
  'name': 'SketchPlanner: Create and Plan- all in one beautiful book!',
  'category': 'Art Books',
  'main_category': 'Publishing',
  'goal': 13000.0,
  'launched': '2017-08-28 15:47:02',
  'state': 'success

In [13]:
collection.insert_many(dictionnaire)

<pymongo.results.InsertManyResult at 0x7f016a568a00>

In [55]:
#collection.delete_many({})

<pymongo.results.DeleteResult at 0x7ff092202380>

## Question 1  

In [58]:
cur=collection.find().sort([("usd_pledged_real",-1)])
list(cur)[0:5]

[{'_id': ObjectId('600381203789f4ca7f33d9d7'),
  'ID': 342886736,
  'name': "COOLEST COOLER: 21st Century Cooler that's Actually Cooler",
  'category': 'Product Design',
  'main_category': 'Design',
  'goal': 50000.0,
  'launched': '2014-07-08 10:14:37',
  'state': 'successful',
  'country': 'US',
  'usd_pledged_real': 13285226.36},
 {'_id': ObjectId('600381203789f4ca7f3434d0'),
  'ID': 2103598555,
  'name': 'Pebble 2, Time 2 + All-New Pebble Core',
  'category': 'Product Design',
  'main_category': 'Design',
  'goal': 1000000.0,
  'launched': '2016-05-24 15:49:52',
  'state': 'successful',
  'country': 'US',
  'usd_pledged_real': 12779843.49},
 {'_id': ObjectId('600381203789f4ca7f3469eb'),
  'ID': 1033978702,
  'name': 'OUYA: A New Kind of Video Game Console',
  'category': 'Gaming Hardware',
  'main_category': 'Games',
  'goal': 950000.0,
  'launched': '2012-07-10 14:44:41',
  'state': 'successful',
  'country': 'US',
  'usd_pledged_real': 8596474.58},
 {'_id': ObjectId('600381203789

## Question 2

In [61]:
nb=collection.find({"state":"successful"}).count()
print(nb)

53040


  nb=collection.find({"state":"successful"}).count()


## Question 3

In [62]:
cur = collection.aggregate([{"$group" : {"_id" : "$category", "nombredeprojets" : {"$sum" : 1}}}])
list(cur)

[{'_id': 'Civic Design', 'nombredeprojets': 130},
 {'_id': '3D Printing', 'nombredeprojets': 271},
 {'_id': 'Hardware', 'nombredeprojets': 1431},
 {'_id': 'Music', 'nombredeprojets': 6229},
 {'_id': 'Drama', 'nombredeprojets': 871},
 {'_id': 'Apparel', 'nombredeprojets': 2827},
 {'_id': 'People', 'nombredeprojets': 440},
 {'_id': 'Fashion', 'nombredeprojets': 3379},
 {'_id': 'Experimental', 'nombredeprojets': 357},
 {'_id': 'Small Batch', 'nombredeprojets': 701},
 {'_id': 'Young Adult', 'nombredeprojets': 328},
 {'_id': 'Embroidery', 'nombredeprojets': 49},
 {'_id': 'Webcomics', 'nombredeprojets': 259},
 {'_id': 'Classical Music', 'nombredeprojets': 1064},
 {'_id': 'Product Design', 'nombredeprojets': 8886},
 {'_id': 'Romance', 'nombredeprojets': 74},
 {'_id': 'R&B', 'nombredeprojets': 172},
 {'_id': 'Faith', 'nombredeprojets': 439},
 {'_id': 'Events', 'nombredeprojets': 322},
 {'_id': 'Gaming Hardware', 'nombredeprojets': 178},
 {'_id': 'Illustration', 'nombredeprojets': 1263},
 {'_id

## Question 4

In [63]:
nbprojets=collection.find({"$and":[{"launched":{"$lt":"2016-01-01"}},{"country":"FR"}]}).count()
print(nbprojets)

330


  nbprojets = collection.find({"$and":[{"launched":{"$lt":"2016-01-01"}},{"country":"FR"}]}).count()


## Question 5

In [66]:
cur=collection.find({"$and":[{"goal":{"$gt":200000.0}},{"country":"US"}]})
list(cur)

[{'_id': ObjectId('6003811f3789f4ca7f32af55'),
  'ID': 866634482,
  'name': 'A CALL TO ADVENTURE',
  'category': 'Film & Video',
  'main_category': 'Film & Video',
  'goal': 287000.0,
  'launched': '2012-08-13 23:14:02',
  'state': 'failed',
  'country': 'US',
  'usd_pledged_real': 1465.0},
 {'_id': ObjectId('6003811f3789f4ca7f32b008'),
  'ID': 993194166,
  'name': 'Storybricks, the storytelling online RPG',
  'category': 'Video Games',
  'main_category': 'Games',
  'goal': 250000.0,
  'launched': '2012-05-01 20:49:58',
  'state': 'failed',
  'country': 'US',
  'usd_pledged_real': 23680.54},
 {'_id': ObjectId('6003811f3789f4ca7f32b010'),
  'ID': 1147175344,
  'name': 'Shine On New World',
  'category': 'Theater',
  'main_category': 'Theater',
  'goal': 300000.0,
  'launched': '2013-09-30 18:18:40',
  'state': 'failed',
  'country': 'US',
  'usd_pledged_real': 12314.0},
 {'_id': ObjectId('6003811f3789f4ca7f32b01f'),
  'ID': 2012810303,
  'name': 'Nightclub',
  'category': 'Music',
  'ma

## Question 6 

In [75]:
collection.create_index([("name",  "text")])

'name_text'

In [76]:
collection.index_information()

{'_id_': {'v': 2, 'key': [('_id', 1)]},
 'name_text': {'v': 2,
  'key': [('_fts', 'text'), ('_ftsx', 1)],
  'weights': SON([('name', 1)]),
  'default_language': 'english',
  'language_override': 'language',
  'textIndexVersion': 3}}

In [77]:
cur=collection.find({"$text":{ "$search": "sport"}})
list(cur)

[{'_id': ObjectId('6003811f3789f4ca7f335e08'),
  'ID': 1875366029,
  'name': 'Sport Smart. A New Genre of Sports TV. Sport Fans Unite!!!',
  'category': 'Webseries',
  'main_category': 'Film & Video',
  'goal': 4500.0,
  'launched': '2010-12-16 08:40:02',
  'state': 'failed',
  'country': 'US',
  'usd_pledged_real': 25.0},
 {'_id': ObjectId('6003811f3789f4ca7f32dfea'),
  'ID': 1126822169,
  'name': 'Frey Sports App - We connect sports people.',
  'category': 'Apps',
  'main_category': 'Technology',
  'goal': 25000.0,
  'launched': '2016-12-12 11:30:28',
  'state': 'failed',
  'country': 'DK',
  'usd_pledged_real': 0.0},
 {'_id': ObjectId('600381203789f4ca7f34c668'),
  'ID': 1081541783,
  'name': 'Daily Fantasy Sports | Sports Analytics Platform | DFS',
  'category': 'Web',
  'main_category': 'Technology',
  'goal': 33750.0,
  'launched': '2017-11-12 14:48:42',
  'state': 'failed',
  'country': 'US',
  'usd_pledged_real': 0.0},
 {'_id': ObjectId('6003811f3789f4ca7f338f06'),
  'ID': 3778