<a href="https://colab.research.google.com/github/emmanuelvaie/google_colab/blob/main/BigQueryAnalysis_profiles.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# @title Setup
from google.colab import auth
from google.cloud import bigquery
from google.colab import data_table

project = 'fluent-music-364313' # Project ID inserted based on the query results selected to explore
location = 'europe-west9' # Location inserted based on the query results selected to explore
client = bigquery.Client(project=project, location=location)
data_table.enable_dataframe_formatter()
auth.authenticate_user()

In [2]:
query = """
SELECT * FROM import_boond.sonate_profiles where Ingestion_date='2022-12-02 13:32:05.215497'; 
"""
job = client.query(query)

## Reference SQL syntax from the original job
Use the ```jobs.query```
[method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query) to
return the SQL syntax from the job. This can be copied from the output cell
below to edit the query now or in the future. Alternatively, you can use
[this link](https://console.cloud.google.com/bigquery?j=fluent-music-364313:europe-west9:bquxjob_681ca93c_184a4051b54)
back to BigQuery to edit the query within the BigQuery user interface.

In [3]:
print(job.query)


SELECT * FROM import_boond.sonate_profiles where Ingestion_date='2022-12-02 13:32:05.215497'; 



# Result set loaded from BigQuery job as a DataFrame
Query results are referenced from the Job ID ran from BigQuery and the query
does not need to be re-run to explore results. The ```to_dataframe```
[method](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.to_dataframe)
downloads the results to a Pandas DataFrame by using the BigQuery Storage API.

To edit query syntax, you can do so from the BigQuery SQL editor or in the
```Optional:``` sections below.

In [4]:
# Running this code will read results from your previous job
results = job.to_dataframe()
results.head()



Unnamed: 0,Civilit__,Pr__nom,Nom,Type,Etape,Titre,Comp__tences,Evaluation_Globale,Date_de_Naissance,Nationalit__,...,CV2,CV3,CV4,CV5,Mobilit__,Langues,Domaines,Secteurs,Outils,Ingestion_date
0,M,Hocine,MEZGHICHE,Consultant Externe,Vivier sourcing,Big Data Engineer,"scala, machine learning, aws, master, big data...",Inconnue,,Inconnue,...,,,,,Inconnue,,"Cloud, Data",,"scala : 3, machine learning : 3, aws : 3, mast...",2022-12-02 13:32:05.215497
1,M,Jérémie,Dourneau,Consultant Externe,Vivier sourcing,Data ingénieur,"hadoop, python, scala, spark, oracle, agile, e...",Inconnue,,Inconnue,...,,,,,Inconnue,,Data,,"hadoop : 3, python : 3, scala : 3, spark : 3, ...",2022-12-02 13:32:05.215497
2,M,Johnny,KNOBLAUCH,Consultant Externe,Vivier sourcing,.net developer,"management, architect, graphql, bi, saas, type...",Inconnue,,Inconnue,...,,,,,Inconnue,,Data,,"management : 3, architect : 3, graphql : 3, bi...",2022-12-02 13:32:05.215497
3,M,Slim,SOUMRI,Consultant Externe,Vivier sourcing,freelancer java,"docker, npm, javascript, apache, kubernetes, g...",Inconnue,,Inconnue,...,,,,,Inconnue,,RD,,"docker : 3, npm : 3, javascript : 3, apache : ...",2022-12-02 13:32:05.215497
4,M,Mustapha,ALIANE,Consultant Externe,Vivier sourcing,freelancer JAVA,"apache, css, master, xml, sql, data, api, reac...",Inconnue,,Inconnue,...,,,,,Inconnue,,"RD, Data",,"apache : 5, css : 5, master : 5, xml : 5, sql ...",2022-12-02 13:32:05.215497


## Show descriptive statistics using describe()
Use the ```pandas DataFrame.describe()```
[method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html)
to generate descriptive statistics. Descriptive statistics include those that
summarize the central tendency, dispersion and shape of a dataset’s
distribution, excluding ```NaN``` values. You may also use other Python methods
to interact with your data.

In [5]:
results.describe(include='all')



Unnamed: 0,Civilit__,Pr__nom,Nom,Type,Etape,Titre,Comp__tences,Evaluation_Globale,Date_de_Naissance,Nationalit__,...,CV2,CV3,CV4,CV5,Mobilit__,Langues,Domaines,Secteurs,Outils,Ingestion_date
count,977,977,901,977,977,944,781,977,78,977,...,186,38,9,3,971,157,771,34,781,977
unique,1,547,755,2,9,514,666,5,63,3,...,157,35,9,3,24,17,19,20,672,1
top,M,Mohamed,Ahmed,Consultant Externe,Vivier,Devops,"apache, management, aws lambda, aws, perl, php...",Inconnue,19/08/1990,Inconnue,...,Sonate_Hamza M.docx,"CodinGame - Docker,_Kubernetes,_Python_3_-_Sen...",CodinGame - Développeur_Python_-_Senior - Habi...,CodinGame - Développeur_Full_Stack_Django_-_Se...,Inconnue,", Anglais : Inconnue","RD, Data",banque assurance,"apache : 3, management : 3, aws lambda : 3, aw...",2022-12-02 13:32:05.215497
freq,977,34,4,944,345,25,3,619,2,967,...,3,2,1,1,898,78,211,5,3,977
mean,,,,,,,,,,,...,,,,,,,,,,
std,,,,,,,,,,,...,,,,,,,,,,
min,,,,,,,,,,,...,,,,,,,,,,
25%,,,,,,,,,,,...,,,,,,,,,,
50%,,,,,,,,,,,...,,,,,,,,,,
75%,,,,,,,,,,,...,,,,,,,,,,


In [6]:
cols = results.columns

In [7]:
import pandas as pd
prf = pd.DataFrame()

In [8]:
for c in cols:
  nb_null = results[c].isna().sum()
  freq = results[c].value_counts()
  d = pd.DataFrame(data = {'nom_col': c, 'nb_null': [nb_null], 'freq': [freq]})
  #prf = prf.append(d)
  prf = pd.concat([prf,d])


In [9]:
prf.head()

Unnamed: 0,nom_col,nb_null,freq
0,Civilit__,0,"M 977 Name: Civilit__, dtype: int64"
0,Pr__nom,0,Mohamed 34 Youssef 18 Ahmed ...
0,Nom,76,Ahmed 4 TAHAR ...
0,Type,0,Consultant Externe 944 Consultant Interne ...
0,Etape,0,Vivier 345 No Go Sonat...


In [10]:
nb_records = len(results)
nb_records

977

In [11]:
prf['pct_null'] = prf['nb_null'].apply(lambda x : 100 * x/nb_records)

In [12]:
prf

Unnamed: 0,nom_col,nb_null,freq,pct_null
0,Civilit__,0,"M 977 Name: Civilit__, dtype: int64",0.0
0,Pr__nom,0,Mohamed 34 Youssef 18 Ahmed ...,0.0
0,Nom,76,Ahmed 4 TAHAR ...,7.778915
0,Type,0,Consultant Externe 944 Consultant Interne ...,0.0
0,Etape,0,Vivier 345 No Go Sonat...,0.0
0,Titre,33,Devops 25 Data Enginee...,3.377687
0,Comp__tences,196,"apache, management, aws lambda, aws, perl, php...",20.061412
0,Evaluation_Globale,0,Inconnue 619 D 273 C 3...,0.0
0,Date_de_Naissance,899,19/08/1990 2 30/11/1990 2 ...,92.016377
0,Nationalit__,0,Inconnue 967 Non Français 7 França...,0.0
