Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Argument Google.Cloud.Bigquery.Magics --export_to_csv #72

Closed
julioyildo opened this issue Apr 2, 2020 · 7 comments
Closed

New Argument Google.Cloud.Bigquery.Magics --export_to_csv #72

julioyildo opened this issue Apr 2, 2020 · 7 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@julioyildo
Copy link

julioyildo commented Apr 2, 2020

New Argument --export_to_csv

Hello,

In the case of our business, we would need an argument capable of handling csv export on GCS directly with magic queries.
We thought this new argument could enrich the library.

`--export_to_csv`
        Args:
        ------
            -"filename" (Str):
                    - GCS Destination URI (e.g. "gs://bucketname/filenmane.csv")

            "temporary_dataset":
                Location where the temporary table will be created 
                in Bigquery before being exported as csv to GCS and then deleted from Bgiquery.
                --
                IMPORTANT : needed if `--destination_table` is not used in the same magic query.

Use Cases:

A) Without --destination_table

to_gcs = {"filename" : "gs://bucket_name/emails_280319.csv", 
                "temporary_dataset" : "dataset_id"
                 }
%%biquery --export_to_csv $to_gcs
SELECT emails, names
FROM `project_name.dataset_name.table_name`

result :
1 - create a tomparary table with the query result
2 - export the table to gcs
3 - delete the temporary table

B) With --destination_table

output_table = "consumers.emails_280319" 
to_gcs = {"filename" : "gs://bucket_name/emails_280319.csv"}

In this case we don't need to declare "temporary_dataset"
because the user wants to save the result in Bigquery before exporting it to GCS.

%%biquery  --destination_table $output_table  --export_to_csv $to_gcs
SELECT emails, names
FROM `project_name.dataset_name.table_name`

result :
1 - Save the query in projectName.consumers.emails_280319
2 - export the table to gcs

Code Snippet:

  • New Function _handle_export_to_csv:
def _handle_export_to_csv(client, destination_uri, destination_table_ref):
    
    extract_job = client.extract_table(
           destination_table_ref,
           destination_uri,
           location="EU"
    )  
    return extract_job
  • In _cell_magic()

         
        job_config = bigquery.job.QueryJobConfig()
        job_config.query_parameters = params
        job_config.use_legacy_sql = args.use_legacy_sql
        job_config.dry_run = args.dry_run




        ######################
        # --destination_table
        ######################

        extract_job=None
        destination_table_ref = None
        
        if args.destination_table:
            split = args.destination_table.split(".")
 
            if len(split) != 2:
                raise ValueError(
                        "--destination_table should be in a <dataset_id>.<table_id> format."
                    )

            dataset_id, table_id = split   
           
            
            dataset_ref = bigquery.dataset.DatasetReference(client.project, dataset_id)
            print('_Destionation Table destdataset_id= ', dataset_ref)

            destination_table_ref = dataset_ref.table(table_id)
            print('destination_table_ref', destination_table_ref)

            #job_config.destination = destination_table_ref  <- REPLACED ABOVE
            job_config.allow_large_results = True
            job_config.create_disposition = "CREATE_IF_NEEDED"
            job_config.write_disposition = "WRITE_TRUNCATE"

            _create_dataset_if_necessary(client, dataset_id)
            
            




        ######################
        # --export_to_csv
        ######################
        if args.export_to_csv:
            #INIT TEMPORARY TABLE
            export_csv_params = ast.literal_eval("".join(args.export_to_csv))
            len_export_csv_params = len(export_csv_params)
            
            if destination_table_ref==None:

                # If user does not use --destination_table but wants to --export_to_csv
                if ( not 0 < len_export_csv_params < 3) or ( not export_csv_params.get('filename') or not export_csv_params.get('temporary_dataset')) :
                        raise ValueError(
                        """--export_to_csv should be tin a : {'filename':'gs://bucket_name/filename', 'temporary_dataset' : 'dataset_id' } format."""
                    )
            
                temporary_table = str(uuid.uuid4()).replace('-','_')
                destination_table_ref =  project + '.'+ export_csv_params['temporary_dataset'] +'.' + temporary_table         
               
                

                
            elif ( len_export_csv_params !=1) or ( not export_csv_params.get('filename')):
                # If user use --destination_table  and --export_to_bucket ignore dataset
                        raise ValueError(
                        """--export_to_csv should be in : {'filename':'gs://bucket_name/filename'} format. No need to pass 'temporary_dataset' argument"""
                    )
            
            
            destination_uri = export_csv_params['filename']
        
        
        
        
        
        
        
        try:
            #QUERY
            query_job = _run_query(client, query, job_config=job_config)
            query_job.result()
            
            #EXPORT TO GCS
            if args.export_to_csv:
                export_params =ast.literal_eval("".join(args.export_to_csv))
                extract_job = _handle_export_to_csv(client, destination_uri, destination_table_ref)
                print('Export to GCS)


        except Exception as ex:
            _handle_error(ex, args.destination_var) ## <-- revoir l'erreur ? 
            return
  


        result = query_job.to_dataframe()
            

        
        # DELETE TEMPORARY TABLE
        if not args.destination_table and args.export_to_csv:
            client.delete_table(destination_table_ref)
            print('Delete temporary table')



        # REST OF COD
        if args.destination_var:
            IPython.get_ipython().push({args.destination_var: result})
        else:
            return result


Would you know if a similar feature is planned for the library?

Thank you very much.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Apr 2, 2020
@julioyildo julioyildo changed the title New Argument Google.Cloud.Bigquery.Magics --export_to_csv` New Argument Google.Cloud.Bigquery.Magics --export_to_csv Apr 2, 2020
@meredithslota meredithslota added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Apr 2, 2020
@plamut
Copy link
Contributor

plamut commented Apr 3, 2020

@julioyildo It's not planned AFAIK, but exporting data as CSV actually sounds as something that quite a few people might find useful (IMHO). I will bring up the proposal at the next meeting.

Also, thanks for the detailed specs and even a sample implementation!

@julioyildo
Copy link
Author

Thank you @plamut ! :)

@plamut
Copy link
Contributor

plamut commented Apr 8, 2020

@julioyildo A quick update - directly implementing the export-to-csv functionality in cell magic is not planned, but there are plans to add a dedicated "export API" on the backend. That API can then be integrated in cell magic.

It's not something that can be expected in the very near future, though.

@NI1993
Copy link

NI1993 commented Jun 7, 2021

@julioyildo A quick update - directly implementing the export-to-csv functionality in cell magic is not planned, but there are plans to add a dedicated "export API" on the backend. That API can then be integrated in cell magic.

It's not something that can be expected in the very near future, though.

@plamut Any updates that you know of regarding this feature?

It would be extremely helpful for me as well.

@plamut
Copy link
Contributor

plamut commented Jun 7, 2021

@NI1993 I can ask if anything has changed in this regard, will post any updates under your feature request.

Edit: I see that the original feature request is about the Jupyter notebook cell magic, thus thanks for opening a new request.

@NI1993
Copy link

NI1993 commented Jun 7, 2021

@plamut Thanks, Highly appreciated!

@shollyman
Copy link
Contributor

Revisiting this FR as it's languished for a bit.

I'm going to close this out, as this seems like a clear win for the EXPORT DATA statement to get query results quickly and easily to a CSV file in GCS.

https://cloud.google.com/bigquery/docs/exporting-data#sql for examples of using this, but the core idea is that you can run exports via the query language, rather than needing to accomplish this in two stages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants