Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move BigQuery backend to a separate repository #2665

Closed
datapythonista opened this issue Mar 5, 2021 · 7 comments
Closed

Move BigQuery backend to a separate repository #2665

datapythonista opened this issue Mar 5, 2021 · 7 comments
Milestone

Comments

@datapythonista
Copy link
Contributor

I think it would make sense to move BigQuery (probably more backends, but starting by discussing BigQuery) into a separate repository, in the same way as we did or OmniSciDB in #2356.

I think in the past it probably made sense to have all backends in this same repo. But I think now it makes more sense to try to reduce the number we develop here. Some of the reasons:

  • We have 14 backends in this repo, at least 3 or 4 in separate repos, and new ones being developed. Having all them here is no longer an option
  • With the new entrypoints system that we just merged (Using entry points for loading backends #2379), the API for the user will be exactly the same
  • Users will have to install a separate conda package. But I think this is better than having to install the backend dependencies individually, or installing all backend dependencies with Ibis
  • For the previous point, we're already planning to have different conda-forge packages for some backends (build: create independent conda-forge packages to backends separately of the main one #2448). So, where the backend is developed won't have an impact for users
  • External backends can have other maintainers, that can know better the backend code than Ibis devs. That's the case of @tswast
  • For the previous point, the backend will be as well maintained whether it's here, or in a separate repo. Probably better maintained, since Ibis maintainers won't be a bottleneck
  • Separate backends can be released more often if needed. If BigQuery adds an additional feature, it can be supported faster, by just releasing the backend, and not having to wait for an Ibis release
  • The Ibis CI is a bit heavy with 17 builds. Which besides taking time, makes navigating the builds a bit annoying
  • In the case of BigQuery, it has special requirements regarding the CI, since access information can't be used on PR builds. Moving it elsewhere will simplify the CI here

The only drawback that I would consider is being able to make changes to both Ibis core and the backends at the same time. I'm working at the moment in having a well specified API for Ibis to communicate with backends. With a proper public API for backends, I think modifying things in both at the same time will become a liability more than an asset. I think any change in Ibis required by a backend should be made first and separate from the changes to the backend code. It'll still take some weeks until we're in this point, but I think it's a good time to start considering moving more backends out. I think BigQuery is a good candidate that should benefit from it, besides Ibis itself.

Thoughts @jreback and @tswast ?

@jreback
Copy link
Contributor

jreback commented Mar 5, 2021

agree on all points here

we should extend the docs to point to additional backends

but i don't think we should have any testing in the main repo for them (to avoid crazy does)

wouldn't object to moving impala / clickhouse but let's do omniscidb and bigquery seem good to me

@tswast
Copy link
Collaborator

tswast commented Mar 5, 2021

Agreed.

but i don't think we should have any testing in the main repo for them (to avoid crazy does)

So long as we have some tests in the BQ repo against the HEAD of the core Ibis library, this sounds good to me.

@tswast
Copy link
Collaborator

tswast commented Apr 9, 2021

ibis-project/ibis-bigquery#1 has some initial GitHub Actions setup. I have a few questions about it, but I think it's in at least the same shape as the BigQuery backend is in this repo now.

@renato2099
Copy link
Contributor

hey guys, are there any other tasks to complete this move? I guess we will also have to update the docs and so on?
I tried to connect to bigQuery using the "old" way but that obviously doesn't work anymore ;)

bigquery_client = ibis.bigquery.connect(project_id="playground-project-302100", dataset_id="new_york_citibike")

so I wouldn't mind helping out to document the changes made

@datapythonista
Copy link
Contributor Author

It should still work. It's using entrypoints, so you need to install the bigquery package with pip, like pip install -e <directory-with-setup.py>.

For the documentation, there shouldn't be any change for the user, other than installing the package separately, so not much to document I'd say. But feel free to clarify or add anything you think it makes sense.

Documenting the new backend API is something we'll have to do. But still working on it, so not sure what the latest version will look like.

@renato2099
Copy link
Contributor

thanks @datapythonista ! I had installed with pip install "ibis-framework[bigquery]" and I had version ibis_framework-1.4.0 now I tried cloning the new repo https://github.com/ibis-project/ibis-bigquery and installed through the setup.py and that worked without issues.

@datapythonista
Copy link
Contributor Author

The backend has been moved to https://github.com/ibis-project/ibis-bigquery, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants