<a href="https://colab.research.google.com/github/emmanuelvaie/google_colab/blob/main/BigQueryAnalysis_cie_contacts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# @title Setup
from google.colab import auth
from google.cloud import bigquery
from google.colab import data_table

project = 'fluent-music-364313' # Project ID inserted based on the query results selected to explore
location = 'europe-west9' # Location inserted based on the query results selected to explore
client = bigquery.Client(project=project, location=location)
data_table.enable_dataframe_formatter()
auth.authenticate_user()

In [2]:
query = """
SELECT * from import_boond.sonate_cies_contacts where Ingestion_date = TIMESTAMP('2022-12-23 13:47:11.000','CET'); 
"""
job = client.query(query)

## Reference SQL syntax from the original job
Use the ```jobs.query```
[method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query) to
return the SQL syntax from the job. This can be copied from the output cell
below to edit the query now or in the future. Alternatively, you can use
[this link](https://console.cloud.google.com/bigquery?j=fluent-music-364313:europe-west9:bquxjob_681ca93c_184a4051b54)
back to BigQuery to edit the query within the BigQuery user interface.

In [3]:
print(job.query)


SELECT * from import_boond.sonate_cies_contacts where Ingestion_date = TIMESTAMP('2022-12-23 13:47:11.000','CET'); 



# Result set loaded from BigQuery job as a DataFrame
Query results are referenced from the Job ID ran from BigQuery and the query
does not need to be re-run to explore results. The ```to_dataframe```
[method](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.to_dataframe)
downloads the results to a Pandas DataFrame by using the BigQuery Storage API.

To edit query syntax, you can do so from the BigQuery SQL editor or in the
```Optional:``` sections below.

In [4]:
# Running this code will read results from your previous job
results = job.to_dataframe()
results.head()



Unnamed: 0,Soci_t____nom,Soci_t____Etat,Soci_t____Informations,Soci_t____Adresse,Soci_t____Code_postal,Soci_t____Ville,Soci_t____Pays,Soci_t____Site_web,Soci_t____T_l_phone,Soci_t____Effectif,...,Contact___Email_3,Contact___T_l_phone_1,Contact___T_l_phone_2,Contact___Note___Date,Contact___Note___Texte,Contact___Provenance,Contact___Provenance___Pr_cisez,Contact___Domaines,Contact___Responsable_manager,Ingestion_date
0,,Prospect,,,,,,,,,...,,,,,https://fr.linkedin.com/in/guy-maurice-limbio-...,Prospection,,Data,6,2022-12-23 12:47:11+00:00
1,AXA,Prospect,,313 terrasse de l'Arche,92727.0,Nanterre,France,www.axa.fr,+33 1 40 50 60 70,10000 et +,...,,,,,https://fr.linkedin.com/in/abear,Prospection,,Cloud,6,2022-12-23 12:47:11+00:00
2,AXA,Prospect,,313 terrasse de l'Arche,92727.0,Nanterre,France,www.axa.fr,+33 1 40 50 60 70,10000 et +,...,,,,,https://fr.linkedin.com/in/ericmorali,Prospection,,Data,6,2022-12-23 12:47:11+00:00
3,AXA,Prospect,,313 terrasse de l'Arche,92727.0,Nanterre,France,www.axa.fr,+33 1 40 50 60 70,10000 et +,...,,,,,https://fr.linkedin.com/in/nicolas-shire,Prospection,,Data,6,2022-12-23 12:47:11+00:00
4,AXA,Prospect,,313 terrasse de l'Arche,92727.0,Nanterre,France,www.axa.fr,+33 1 40 50 60 70,10000 et +,...,,,,,https://fr.linkedin.com/in/nicolas-shire,Prospection,,Data,6,2022-12-23 12:47:11+00:00


## Show descriptive statistics using describe()
Use the ```pandas DataFrame.describe()```
[method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html)
to generate descriptive statistics. Descriptive statistics include those that
summarize the central tendency, dispersion and shape of a dataset’s
distribution, excluding ```NaN``` values. You may also use other Python methods
to interact with your data.

In [5]:
results.describe(include='all')



  results.describe(include='all')


Unnamed: 0,Soci_t____nom,Soci_t____Etat,Soci_t____Informations,Soci_t____Adresse,Soci_t____Code_postal,Soci_t____Ville,Soci_t____Pays,Soci_t____Site_web,Soci_t____T_l_phone,Soci_t____Effectif,...,Contact___Email_3,Contact___T_l_phone_1,Contact___T_l_phone_2,Contact___Note___Date,Contact___Note___Texte,Contact___Provenance,Contact___Provenance___Pr_cisez,Contact___Domaines,Contact___Responsable_manager,Ingestion_date
count,1322,1323,0.0,1322,1323.0,1322,1322,1322,1322,1322,...,0.0,61.0,0.0,0.0,1221,1323,0.0,315,1323.0,1323
unique,18,2,0.0,18,14.0,9,2,18,18,8,...,0.0,5.0,0.0,0.0,316,2,0.0,19,,1
top,Crédit Agricole,Client,,"12, place des Etats-Unis",92127.0,Montrouge,France,https://www.credit-agricole.fr/,+33 1 57 72 90 45,1000 à 1999,...,,,,,nan nan,Prospection,,Devops,,2022-12-23 12:47:11+00:00
freq,821,1223,,821,821.0,821,821,821,821,833,...,,49.0,,,535,1120,,152,,1323
first,,,,,,,,,,,...,,,,,,,,,,2022-12-23 12:47:11+00:00
last,,,,,,,,,,,...,,,,,,,,,,2022-12-23 12:47:11+00:00
mean,,,,,,,,,,,...,,,,,,,,,5.419501,
std,,,,,,,,,,,...,,,,,,,,,1.185572,
min,,,,,,,,,,,...,,,,,,,,,3.0,
25%,,,,,,,,,,,...,,,,,,,,,6.0,


In [6]:
cols = results.columns

In [7]:
import pandas as pd
prf = pd.DataFrame()

In [8]:
for c in cols:
  nb_null = results[c].isna().sum()
  freq = results[c].value_counts()
  d = pd.DataFrame(data = {'nom_col': c, 'nb_null': [nb_null], 'freq': [freq]})
  #prf = prf.append(d)
  prf = pd.concat([prf,d])


In [9]:
prf.head()

Unnamed: 0,nom_col,nb_null,freq
0,Soci_t____nom,1,Crédit Agricole 821 Société Générale ...
0,Soci_t____Etat,0,Client 1223 Prospect 100 Name: Soci_t...
0,Soci_t____Informations,1323,"Series([], Name: Soci_t____Informations, dtype..."
0,Soci_t____Adresse,1,"12, place des Etats-Unis 821 29..."
0,Soci_t____Code_postal,0,92127 821 75009 272 92930 190 75007 ...


In [10]:
nb_records = len(results)
nb_records

1323

In [11]:
prf['pct_null'] = prf['nb_null'].apply(lambda x : 100 * x/nb_records)

In [12]:
prf

Unnamed: 0,nom_col,nb_null,freq,pct_null
0,Soci_t____nom,1,Crédit Agricole 821 Société Générale ...,0.075586
0,Soci_t____Etat,0,Client 1223 Prospect 100 Name: Soci_t...,0.0
0,Soci_t____Informations,1323,"Series([], Name: Soci_t____Informations, dtype...",100.0
0,Soci_t____Adresse,1,"12, place des Etats-Unis 821 29...",0.075586
0,Soci_t____Code_postal,0,92127 821 75009 272 92930 190 75007 ...,0.0
0,Soci_t____Ville,1,Montrouge 821 Paris ...,0.075586
0,Soci_t____Pays,1,France 821 France 501 Name: Soci_t____...,0.075586
0,Soci_t____Site_web,1,https://www.credit-agricole.fr/ 821 http...,0.075586
0,Soci_t____T_l_phone,1,+33 1 57 72 90 45 821 +33 1 42 14 20 00 ...,0.075586
0,Soci_t____Effectif,1,1000 à 1999 833 10000 et + 267 2000 à...,0.075586
