New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mldata.org is down (for good?) #8588
Comments
I'm having the same problem -- the scikit-learn function as well as trying to go directly to mldata.org ... Any thoughts? |
I suppose we should be deprecating mldata fetchers :(
…On 20 March 2017 at 12:48, Ed Williams ***@***.***> wrote:
I'm having the same problem -- the scikit-learn function as well as trying
to go directly to mldata.org ...
Any thoughts?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8588 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6w5xbvu-S9S5XgUA9NkNHSLfaJlCks5rndsFgaJpZM4MdzSC>
.
|
Is there any indication that mldata.org is not going to be back up any time soon ? I did not find anything from a quick googling. Also wild guessing a bit here, but I was mixing mldata.org with mlcomp.org at first, which is supposed to be taken down in March 2017 maybe it is the same for you @jnothman. |
Yes, I did confuse them. But 5 days is also a surprising downtime for
something of this nature!
…On 20 March 2017 at 19:36, Loïc Estève ***@***.***> wrote:
I suppose we should be deprecating mldata fetchers :(
Is there any indication that mldata.org is not going to be back up any
time soon ? I did not find anything from a quick googling. Also wild
guessing a bit here, but I was mixing mldata.org with mlcomp.org at
first, which is supposed to be taken down in March 2017 maybe it is the
same for you @jnothman <https://github.com/jnothman>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8588 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_IHcblaJfm7R6BalHEF1yziKdfUks5rnjp8gaJpZM4MdzSC>
.
|
Agreed. I'll try to contact one of the website maintainer I found through Google and see what happens. |
fwiw nothing on the archive.org copy from March foreshadowed the outage
…On 20 Mar 2017 11:36 pm, "Loïc Estève" ***@***.***> wrote:
5 days is also a surprising downtime for something of this nature!
Agreed. I'll try to contact one of the website maintainer I found through
Google and see what happens.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8588 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz646CuXKGFkC4D5y5An493i-RqNG_ks5rnnKwgaJpZM4MdzSC>
.
|
Before mldata.org goes up, whats the work around if data has not already been downloaded? |
Have you tried googling the name of the dataset you may find a copy somewhere? In principle you just need to find the Alternatively, find someone (or maybe you on another computer) that uses scikit-learn and has it downloaded it to share the content of its |
I tried googling but did could not find the .mat file. However, the data I was looking i got it's link from the book from which data was taken. However, with this data I am not able to reproduce the result which is demonstrated on sklearn user guide page. Due to this I wanted the data in the exact format so that I am sure that result is bad due to data and not due to modelling. |
I googled and found https://github.com/amplab/datascience-sp14/blob/master/lab7/mldata/mnist-original.mat in a matter of seconds. I was able to reproduce the output of this example: Hope this helps although it is hard to know because you are not very explicit about what you are trying to do ... |
@lesteve sorry for being bit vague! I am trying to replicate Guassian Process Regressor Example - http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_co2.html |
Hello there, I'll try to get in touch with mldata's admin. I'll let you know about the updates. It should in theory be provided indefinitely... |
FYI, mldata.org is still down. |
While breaking my head to get the data, I found this github repository which contains most of the data - https://github.com/vincentarelbundock/Rdatasets However, this will involve downloading .Rda or csv file and converting it in the required format which can be consumed by sklearn. |
mldata.org (and also mloss.org unfortunately) servers were very sick... we're on it. |
The way these things are going, I wish we had some way to not rely on the
availability of a single unassured host. Ugh. Torrents anyone?
…On 29 Mar 2017 4:54 am, "(Venkat) Raghav (Rajagopalan)" < ***@***.***> wrote:
@jnothman <https://github.com/jnothman> @amueller
<https://github.com/amueller> @mfeurer <https://github.com/mfeurer> Time
to add openml.org fetcher? ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8588 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_RDwDn_3cyqJHzXO0iaBhqOY5lZks5rqUltgaJpZM4MdzSC>
.
|
Or mirrors? |
I like the torrents solution as it's decentralized but unfortunately there are many corporate and institutional environments where the bittorrent protocol is banned... That being said http://academictorrents.com is a great tracker, especially if you are interested in fetching large datasets like MSCOCO, ImageNet and OpenImages. |
Anyone looking specifically for code on getting the MNIST data set can use this from Tensorflow:
|
@mikiobraun Any news? It seems that mldata.org is returning "Page unavailable" for every request. AFAICT, mldata.org was managed by the European project PASCAL2, which was closed about 3 years ago. Who is in charge (and paying) for the servers now? |
Hi @ageron. Service is hosted by TU Berlin who agreed to keep the service running indefinitely (it is essentially a single instance VM with some NAS attached disk storage). Admin at TU Berlin is on it... . Thanks for your patience... . |
Thanks for your feedback @mikiobraun. |
In case someone needs this, here's a function that downloads MNIST from another source and stores it in the default location where scikit-learn stores mldata datasets (
Here's an example:
|
Keras uploaded the dataset to S3. Amazon charges s3 based on incoming traffic. Another approach is to upload the dataset in GitHub/GitLab repo and use the HTTP URL in the code. |
BTW, is the list of all the datasets hosted at mldata.org available somewhere? |
mldata.org is back up. I agree that there are challenges with keeping the response time small when it does go down though. We will talk to openml to see how we can migrate/mirror mldata there. The tricky bit is to try to retain the versions. |
@chengsoonong thanks! Closing this one. |
Can we re-open this issue now that it appears to be back offline? |
@omtinez what makes you think mldata.org is offline? For example this works fine for me (I made sure the data was not cached locally by deleting from sklearn.datasets import fetch_mldata
fetch_mldata('MNIST original') |
It appears to be back online now, albeit working very slowly for me... |
it appears down for me at this moment edit: back up now, definitely needs something more stable than this |
@ThomasDelteil there is some ongoing work on an OpenML fetcher. You are more than welcome to help on the #9908 PR, e.g. by reviewing, trying it out, giving feedback etc ... |
Added a workaround to download MNIST data since mldata.org keeps going down (scikit-learn/scikit-learn#8588)
for those local file not working, try to create a new notebook and do the samething |
I'm not sure what @nakebull. At least we now have fetch_openml, although openml and mldata have different datasets and openml delivers dats in a text based format that is more flexible, but slower to load. |
I'll try to get all mldata.org datasets into OpenML (the mldata folks agreed to this). At the moments, I sadly can't reach the mldata server. Did anyone ever download all of them? That would be a huge help. Thanks! |
@joaquinvanschoren I just emailed you the email of someone I have contacted in the past about mldata.org problems and has been very helpful each time. Let me know if you don't receive my email. |
You can download the code from THE MNIST DATABASE |
You can get mnist from openml
|
Thanks for your solution! @lesteve, It works after I put https://github.com/amplab/datascience-sp14/blob/master/lab7/mldata/mnist-original.mat into ~/scikit_learn_data/mldata/.
BTW, anyone know why http://mldata.org/ went down so long? |
Gravity. Things go down if no one keeps them up.
|
Description
Unable to retrieve dataset from mdata.org
The site is down.
Steps/Code to Reproduce
Expected Results
[mnist data loaded]
Actual Results
TimeoutError Traceback (most recent call last)
C:\Users\frada\Dev\Python\Miniconda3\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
1317 h.request(req.get_method(), req.selector, req.data, headers,
-> 1318 encode_chunked=req.has_header('Transfer-encoding'))
1319 except OSError as err: # timeout error
C:\Users\frada\Dev\Python\Miniconda3\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
1238 """Send a complete request to the server."""
-> 1239 self._send_request(method, url, body, headers, encode_chunked)
1240
C:\Users\frada\Dev\Python\Miniconda3\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
1284 body = _encode(body, 'body')
-> 1285 self.endheaders(body, encode_chunked=encode_chunked)
1286
C:\Users\frada\Dev\Python\Miniconda3\lib\http\client.py in endheaders(self, message_body, encode_chunked)
1233 raise CannotSendHeader()
-> 1234 self._send_output(message_body, encode_chunked=encode_chunked)
1235
C:\Users\frada\Dev\Python\Miniconda3\lib\http\client.py in _send_output(self, message_body, encode_chunked)
1025 del self._buffer[:]
-> 1026 self.send(msg)
1027
C:\Users\frada\Dev\Python\Miniconda3\lib\http\client.py in send(self, data)
963 if self.auto_open:
--> 964 self.connect()
965 else:
C:\Users\frada\Dev\Python\Miniconda3\lib\http\client.py in connect(self)
935 self.sock = self._create_connection(
--> 936 (self.host,self.port), self.timeout, self.source_address)
937 self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
C:\Users\frada\Dev\Python\Miniconda3\lib\socket.py in create_connection(address, timeout, source_address)
721 if err is not None:
--> 722 raise err
723 else:
C:\Users\frada\Dev\Python\Miniconda3\lib\socket.py in create_connection(address, timeout, source_address)
712 sock.bind(source_address)
--> 713 sock.connect(sa)
714 return sock
TimeoutError: [WinError 10060] Une tentative de connexion a échoué car le parti connecté n’a pas répondu convenablement au-delà d’une certaine durée ou une connexion établie a échoué car l’hôte de connexion n’a pas répondu
Versions
Windows-10-10.0.14393-SP0
Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)]
NumPy 1.12.0
SciPy 0.19.0
Scikit-Learn 0.18.1
The text was updated successfully, but these errors were encountered: