New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidSchema? #3

Closed
shawnmjones opened this Issue Oct 5, 2016 · 3 comments

Comments

Projects
None yet
2 participants
@shawnmjones

shawnmjones commented Oct 5, 2016

With the latest commit (e41e3a8), when running the Docker instance of CarbonDate in local mode, as per http://ws-dl.blogspot.com/2016/09/2016-09-20-carbon-dating-web-version-30.html, I occasionally get an exception like the following.

# sudo docker run --rm -it carbon ./main.py -l search http://www.google.com
cdGetBitly.py::GetBitlyJson(), please set bitly access token in config
(<class 'requests.exceptions.InvalidSchema'>, InvalidSchema("No connection adapters were found for 'hive.org.uk/wayback/archive/20080304103855/http://www.google.com/'",), <traceback object at 0x7f6baa816088>)
Traceback (most recent call last):
  File "/usr/src/app/modules/cdGetArchives.py", line 135, in getArchives
    date = getRealDate(archives[archive]["link"],archives[archive]["time"])
  File "/usr/src/app/modules/cdGetArchives.py", line 85, in getRealDate
    response = requests.get(url,headers=headers)
  File "/usr/local/lib/python3.5/site-packages/requests/api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 590, in send
    adapter = self.get_adapter(url=request.url)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 672, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'hive.org.uk/wayback/archive/20080304103855/http://www.google.com/'
runtime in seconds:  13
{
  "URI": "http://www.google.com",
  "Estimated Creation Date": "2003-01-14T00:00:00",
  "Bitly.com": "",
  "Google.com": "2003-01-14T00:00:00",
  "Bing.com": "",
  "Pubdate tag": "",
  "Last Modified": "",
  "Archives": [
    [
      "Earliest",
      ""
    ],
    [
      "By_Archive",
      {}
    ]
  ],
  "Twitter.com": "2006-04-13T02:58:51",
  "Backlinks": ""
}
@ibnesayeed

This comment has been minimized.

Show comment
Hide comment
@ibnesayeed

ibnesayeed Oct 5, 2016

Member

As far as I remember, there is a -e flag to exclude modules. This exception is happening because you did not provide Bitly key and did not exclude that module. However, I do understand that it needs better documentation in the README as well as the exception should be caught and gracefully handled with a more friendly message or STDERR.

/cc @DarkAngelZT

Member

ibnesayeed commented Oct 5, 2016

As far as I remember, there is a -e flag to exclude modules. This exception is happening because you did not provide Bitly key and did not exclude that module. However, I do understand that it needs better documentation in the README as well as the exception should be caught and gracefully handled with a more friendly message or STDERR.

/cc @DarkAngelZT

@shawnmjones

This comment has been minimized.

Show comment
Hide comment
@shawnmjones

shawnmjones Oct 5, 2016

Well, I would agree, but I always get the "please set bitly access token in config" Bitly message, and do not always get this exception.

# sudo docker run --rm -it carbon ./main.py -l search http://www.cs.odu.edu
cdGetBitly.py::GetBitlyJson(), please set bitly access token in config
runtime in seconds:  8
{
  "URI": "http://www.cs.odu.edu",
  "Estimated Creation Date": "1997-03-24T17:29:34",
  "Pubdate tag": "",
  "Archives": [
    [
      "Earliest",
      "1997-03-24T17:29:34"
    ],
    [
      "By_Archive",
      {
        "http://web.archive.bibalex.org:80/web/20010414022512/http://www.cs.odu.edu/": "2001-03-23T14:55:45",
        "http://arquivo.pt/wayback/20091223043049/http://www.cs.odu.edu/": "2009-12-23T04:30:50",
        "http://web.archive.org/web/19971010201632/http://www.cs.odu.edu/": "1997-03-24T17:29:34",
        "http://archive.is/19970606105039/http://www.cs.odu.edu/": "1997-06-06T06:50:39",
        "http://webcitation.org/query?id=1327284086752784": "2012-01-22T21:01:29"
      }
    ]
  ],
  "Bitly.com": "",
  "Backlinks": "",
  "Last Modified": "",
  "Twitter.com": "2008-12-01T08:53:27",
  "Google.com": "2015-06-02T00:00:00",
  "Bing.com": ""
}

Instead, as mentioned in the requests.exceptions.InvalidSchema exception thrown by the Python requests module, the problem appears to be that something along the way discovered a memento at URI hive.org.uk/wayback/archive/20080304103855/http://www.google.com/ and this URI does not have a scheme (i.e., no "http://", "https://", etc.), causing the requests.get on line 85 of modules/cdGetArchives.py to fail.

shawnmjones commented Oct 5, 2016

Well, I would agree, but I always get the "please set bitly access token in config" Bitly message, and do not always get this exception.

# sudo docker run --rm -it carbon ./main.py -l search http://www.cs.odu.edu
cdGetBitly.py::GetBitlyJson(), please set bitly access token in config
runtime in seconds:  8
{
  "URI": "http://www.cs.odu.edu",
  "Estimated Creation Date": "1997-03-24T17:29:34",
  "Pubdate tag": "",
  "Archives": [
    [
      "Earliest",
      "1997-03-24T17:29:34"
    ],
    [
      "By_Archive",
      {
        "http://web.archive.bibalex.org:80/web/20010414022512/http://www.cs.odu.edu/": "2001-03-23T14:55:45",
        "http://arquivo.pt/wayback/20091223043049/http://www.cs.odu.edu/": "2009-12-23T04:30:50",
        "http://web.archive.org/web/19971010201632/http://www.cs.odu.edu/": "1997-03-24T17:29:34",
        "http://archive.is/19970606105039/http://www.cs.odu.edu/": "1997-06-06T06:50:39",
        "http://webcitation.org/query?id=1327284086752784": "2012-01-22T21:01:29"
      }
    ]
  ],
  "Bitly.com": "",
  "Backlinks": "",
  "Last Modified": "",
  "Twitter.com": "2008-12-01T08:53:27",
  "Google.com": "2015-06-02T00:00:00",
  "Bing.com": ""
}

Instead, as mentioned in the requests.exceptions.InvalidSchema exception thrown by the Python requests module, the problem appears to be that something along the way discovered a memento at URI hive.org.uk/wayback/archive/20080304103855/http://www.google.com/ and this URI does not have a scheme (i.e., no "http://", "https://", etc.), causing the requests.get on line 85 of modules/cdGetArchives.py to fail.

@ibnesayeed

This comment has been minimized.

Show comment
Hide comment
@ibnesayeed

ibnesayeed Oct 5, 2016

Member

Well, which means some sanity check needs to be placed (and fixed) before making the requests call.

Member

ibnesayeed commented Oct 5, 2016

Well, which means some sanity check needs to be placed (and fixed) before making the requests call.

@ibnesayeed ibnesayeed closed this in #4 Jul 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment