Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching data fails #12

Closed
pbenner opened this issue Apr 27, 2023 · 4 comments · Fixed by #13
Closed

Fetching data fails #12

pbenner opened this issue Apr 27, 2023 · 4 comments · Fixed by #13
Labels
bug Something isn't working

Comments

@pbenner
Copy link
Collaborator

pbenner commented Apr 27, 2023

Fetching the data currently fails:

Traceback (most recent call last):
  File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 184, in <module>
    urllib.request.urlretrieve(f"{mat_cloud_url}&{filename=}", file_path)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 241, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: NOT FOUND

urllib.request.urlretrieve(f"{mat_cloud_url}&{filename=}", file_path)

@janosh
Copy link
Owner

janosh commented Apr 28, 2023

Ah, just some extra quotes since {filename=} expands to filename='filename' which should actually be filename=filename. Easy fix.

@janosh
Copy link
Owner

janosh commented Apr 28, 2023

@pbenner Could you do a source install from main and let me know if it's working now? If so, I'll cut a new PyPI release.

@pbenner
Copy link
Collaborator Author

pbenner commented Apr 28, 2023

This particular problem seems to be solved. However, there are further issues:

n_too_stable = 502
n_too_unstable = 22
Traceback (most recent call last):
  File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 473, in <module>
    save_fig(fig, f"{img_path}.svelte")
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/pymatviz/utils.py", line 308, in save_fig
    fig.write_html(path, **defaults)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/plotly/basedatatypes.py", line 3708, in write_html
    return pio.write_html(self, *args, **kwargs)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/plotly/io/_html.py", line 536, in write_html
    path.write_text(html_str)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/pathlib.py", line 1154, in write_text
    with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: '/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/site/src/figs/hist-wbm-e-form-per-atom.svelte'

After manually creating the [...]/src/figs directory, the next issue is the following:

Traceback (most recent call last):
  File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 538, in <module>
    with gzip.open(DATA_FILES.mp_patched_phase_diagram, "rb") as zip_file:
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/gzip.py", line 58, in open
    binary_file = GzipFile(filename, gz_mode, compresslevel)
  File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/data/mp/2023-02-07-ppd-mp.pkl.gz'

The [...]/data/mp directory exists, but the pkl file is missing.

@janosh
Copy link
Owner

janosh commented Apr 28, 2023

Thanks. Let's track that in a new issue. I created #14 from your comment. I need to take a closer look at how best to manage all the data files the MBD analysis relies on in pip installations.

@janosh janosh closed this as completed in 308042f Apr 28, 2023
janosh added a commit that referenced this issue Jun 20, 2023
…d Figshare downloads (#13)

* use load_train_test() to load wbm-summary in data.py (closes #10)

* fetch_process_wbm_dataset.py wrap urllib.request.urlretrieve in try/except (closes #12)

* add scripts/upload_to_figshare.py for publishing data files to figshare

* add data/figshare dir with readme and FIGSHARE in __init__.py

* change load_train_test() to load files from figshare instead of GitHub (closes #11)

* class Files issue warning when accessing a file path that doesn't exist

* docs recommend --depth 1 for git clone

* fix tests/test_data.py

* add auto-generated data/figshare/1.0.0.json

* pyproject.toml drop unused [tool.setuptools.package-data] matbench_discovery = ["data/mp/*.json"]

* fix AttributeError: 'DataFrame' object has no attribute 'material_id'
janosh added a commit that referenced this issue Jun 20, 2023
…d Figshare downloads (#13)

* use load_train_test() to load wbm-summary in data.py (closes #10)

* fetch_process_wbm_dataset.py wrap urllib.request.urlretrieve in try/except (closes #12)

* add scripts/upload_to_figshare.py for publishing data files to figshare

* add data/figshare dir with readme and FIGSHARE in __init__.py

* change load_train_test() to load files from figshare instead of GitHub (closes #11)

* class Files issue warning when accessing a file path that doesn't exist

* docs recommend --depth 1 for git clone

* fix tests/test_data.py

* add auto-generated data/figshare/1.0.0.json

* pyproject.toml drop unused [tool.setuptools.package-data] matbench_discovery = ["data/mp/*.json"]

* fix AttributeError: 'DataFrame' object has no attribute 'material_id'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants