Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic support for remote resources to read_file #531

Merged
merged 5 commits into from Sep 14, 2017
Merged

Add basic support for remote resources to read_file #531

merged 5 commits into from Sep 14, 2017

Conversation

rgieseke
Copy link
Contributor

@rgieseke rgieseke commented Sep 1, 2017

Tested with GeoJSON files.

@rgieseke
Copy link
Contributor Author

rgieseke commented Sep 1, 2017

This might need more work, let me know if you have any suggestions!

(Started at EuroScipy 2017 sprint, see also the related issue #441)

@rgieseke
Copy link
Contributor Author

rgieseke commented Sep 1, 2017

I've added a custom marker for pytest, so one can use

pytest geopandas/io/tests -m "webtest"

and

pytest geopandas/io/tests -m "not webtest"

Maybe there is already someone similar set-up already?

@codecov
Copy link

codecov bot commented Sep 1, 2017

Codecov Report

Merging #531 into master will decrease coverage by 0.06%.
The diff coverage is 91.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #531      +/-   ##
==========================================
- Coverage   93.73%   93.66%   -0.07%     
==========================================
  Files          14       14              
  Lines        1037     1058      +21     
==========================================
+ Hits          972      991      +19     
- Misses         65       67       +2
Impacted Files Coverage Δ
geopandas/io/file.py 94.28% <91.3%> (-1.64%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7101fc9...13664f8. Read the comment docs.

@rgieseke
Copy link
Contributor Author

rgieseke commented Sep 1, 2017

Related: #464

be opened and *kwargs* are keyword args to be passed to the `open`
or (`BytesCollection`) method in the fiona library when opening the file.
For more information on possible keywords, type:
``import fiona; help(fiona.open)``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be valuable to expand the docs a bit here now with this added functionality. In the from_file method on a geodataframe, which wraps read_file, there's a basic example. Something like that would be useful here. Additionally, putting it in numpy style to help with effort to make formatting of docs consistent.

@rgieseke
Copy link
Contributor Author

rgieseke commented Sep 5, 2017

I was trying to add a URL example as well but I can't find a simple one with a short URL - I think I'll add something to the docs which then could be served via http://geopandas.org/

@rgieseke
Copy link
Contributor Author

rgieseke commented Sep 5, 2017

Just fixed some syntax in the docs, should be ok now.


gpd.read_file("https://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_110m_land.geojson")
url = "http://geopandas.orgd2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_110m_land.geojson"
gpd.read_file(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The url you changed it to isn't valid; it also isn't quite as clear anymore with the preceding sentence about reading a GeoJSON file from geojson.xyz (which is the case for the original url, though given that geojson.xyz doesn't appear in the url anywhere, it is not totally transparent then either).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I had started thinking about using a "geopandas.org" URL, then realized it would probably not be a nice URL either and concentrated on fixing the syntax errors.

I amended the commit and all should now work again!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you push the change? I am still seeing the url with geopandas in the diff online?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did, Monday morning ... (thanks for the heads-up!)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Added a few minor comments

@@ -19,6 +19,12 @@ Any arguments passed to ``read_file()`` after the file name will be passed direc

Among other things, one can explicitly set the driver (shapefile, GeoJSON) with the ``driver`` keyword, or pick a single layer from a multi-layered file with the ``layer`` keyword.

Where supported in ``fiona`` *geopandas* can also load resources directly from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a comma between fiona and geopandas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Examples
--------
>>> df = geopandas.read_file("nybb.shp")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no indentation is needed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -54,6 +54,13 @@ def test_read_file(self):
lower_columns = [c.lower() for c in self.columns]
assert (df.columns[:-1] == lower_columns).all()

@pytest.mark.webtest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just "web" instead of "webtest" ?
But it's fine to add a mark like this, we don't have anything set up like this yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, had just taken it from the py.test example in their docs.

@jdmcbr jdmcbr merged commit a43d166 into geopandas:master Sep 14, 2017
@jdmcbr
Copy link
Member

jdmcbr commented Sep 14, 2017

Thanks @rgieseke!

@rgieseke
Copy link
Contributor Author

Thanks @jdmcbr and @jorisvandenbossche !

@rgieseke rgieseke deleted the remote-geojson branch September 14, 2017 04:24
jorisvandenbossche pushed a commit to jorisvandenbossche/geopandas that referenced this pull request Sep 18, 2017
* Add basic support for remote resources to `read_file`

Tested with GeoJSON files.

* TST: Use GeoJSON file from GeoPandas repo

* Switch to numpydoc style

* Fix syntax in docs

* Formatting and rename test marker

(cherry picked from commit a43d166)
jorisvandenbossche pushed a commit that referenced this pull request Sep 18, 2017
* Add basic support for remote resources to `read_file`

Tested with GeoJSON files.

* TST: Use GeoJSON file from GeoPandas repo

* Switch to numpydoc style

* Fix syntax in docs

* Formatting and rename test marker

(cherry picked from commit a43d166)
@linwoodc3
Copy link

When will the pypi version of geopandas include this feature? I installed from the master branch and was able to load the remote file. However, I couldn't read the remote file from the pypi pip install geopandas version.

Also, how feasible is it to add an optional argument for a cert file to handle requests to an https file?

@rgieseke
Copy link
Contributor Author

On the https - good question, what Python version are you using?

For Python versions earlier than 2.7.9, urllib does not attempt to validate the server certificates of HTTPS URIs.

https://docs.python.org/2/library/urllib.html

@linwoodc3
Copy link

Always above 2.7.9 and up to 3.6. Anaconda versions.

@rgieseke
Copy link
Contributor Author

I might have mis-understood your use-case, do you want to use a local cert file?

@linwoodc3
Copy link

Yes, you are correct.

This is the requests equivalent:

import requests

cert = <path to cert>
r = requests.get(url,cert =cert, verify=False)

Documented here: http://docs.python-requests.org/en/master/user/advanced/#client-side-certificates

If we can pass a cert argument in with this, it would do the trick I think. The only thing is, I've never done this in urllib..but sure there's a way.

@jorisvandenbossche
Copy link
Member

There has recently been some discussion about this on the pandas issue tracker: pandas-dev/pandas#16716 (and subsequent PRs). Can be used as inspiration or reused what comes out of that

@dzbee dzbee mentioned this pull request Nov 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants