Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement Arrow2's odbc reader and writers #2994

Open
ritchie46 opened this issue Mar 28, 2022 · 6 comments
Open

implement Arrow2's odbc reader and writers #2994

ritchie46 opened this issue Mar 28, 2022 · 6 comments
Labels
enhancement New feature or an improvement of an existing feature good first issue Good for newcomers

Comments

@ritchie46
Copy link
Member

We now have native ODBC support upstream. This has to be exposed in polars similarly to existing IO readers and writers.

@trickster
Copy link

trickster commented Jun 28, 2022

Preliminary version I came up with,

This only works with string columns (UTF8 implemented only).

use arrow2::array::Utf8Array;
use arrow2::error::Result;
use arrow2::io::odbc::api::Cursor;
use arrow2::io::odbc::{api, read};
use polars::prelude::*;

const QUERY: &str = include_str!("../query.sql");

fn main() -> Result<()> {
    let connector = "ODBC_STRING";

    let env = api::Environment::new()?;
    let connection = env.connect_with_connection_string(connector)?;
    let mut prep = connection.prepare(QUERY)?;

    let fields = read::infer_schema(&prep)?;

    let mut df = fields
        .iter()
        .map(|s| match s.data_type {
            ArrowDataType::Utf8 => Series::new_empty(&s.name, &DataType::Utf8),
            _ => unimplemented!(),
        })
        .collect::<Vec<_>>();

    let max_batch_size = 100;
    let buffer = read::buffer_from_metadata(&prep, max_batch_size)?;

    let cursor = prep.execute(())?.unwrap();
    let mut cursor = cursor.bind_buffer(buffer)?;

    while let Some(batch) = cursor.fetch()? {
            for ((idx, field), df_elem) in (0..batch.num_cols()).zip(fields.iter()).zip(df.iter_mut()) {
                let column_view = batch.column(idx);
                let arr = Arc::from(read::deserialize(column_view, field.data_type.clone()));
                let series = Series::try_from((field.name.as_str(), vec![arr])).unwrap();
                df_elem.append(&series).unwrap();
            }
        }
    
        let dataframe = DataFrame::new(df).unwrap();
        dbg!(dataframe);
        Ok(())
    }

We need to ideally utilize this function, although it works on chunks together, not individual one.

Edit: Series::try_from would be enough

@cnphil
Copy link

cnphil commented Aug 19, 2022

I'm tempting to work on this. Will draft the PR on the weekend.

@trickster
Copy link

I got a version that is working here (can infer schema) here

@stinodego stinodego added enhancement New feature or an improvement of an existing feature and removed feature labels Jul 14, 2023
@cnpryer
Copy link
Contributor

cnpryer commented Jul 16, 2023

I see this is still open. Is there interest in this?

@sportfloh
Copy link
Contributor

hey,
@ cnpryer yes :-)

I have a similar requirement.
Currently I use odbc-api to get data from an DB2 (IBM i) database .
I tried to use arrow-odbc but I didn't find a way to create a polars DataFrame from arrow RecordBatch.
It would be really nice if some thing like the python from_arrow function could be implemented in the Rust API.
Or maybe I didn't find a simple way to do it?
Thanks and Cheers!

@sportfloh
Copy link
Contributor

Hi,
here is my current solution with arrow_odbc, arrow, polars_arrow.
pacman82/odbc-api#536 (comment)

I found the arrow RecordBatch to DataFrame code hiere:
https://stackoverflow.com/questions/78084066/arrow-recordbatch-as-polars-dataframe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

6 participants