Skip to content

InvalidSchema? #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shawnmjones opened this issue Oct 5, 2016 · 3 comments
Closed

InvalidSchema? #3

shawnmjones opened this issue Oct 5, 2016 · 3 comments

Comments

@shawnmjones
Copy link
Member

With the latest commit (e41e3a8), when running the Docker instance of CarbonDate in local mode, as per http://ws-dl.blogspot.com/2016/09/2016-09-20-carbon-dating-web-version-30.html, I occasionally get an exception like the following.

# sudo docker run --rm -it carbon ./main.py -l search http://www.google.com
cdGetBitly.py::GetBitlyJson(), please set bitly access token in config
(<class 'requests.exceptions.InvalidSchema'>, InvalidSchema("No connection adapters were found for 'hive.org.uk/wayback/archive/20080304103855/http://www.google.com/'",), <traceback object at 0x7f6baa816088>)
Traceback (most recent call last):
  File "/usr/src/app/modules/cdGetArchives.py", line 135, in getArchives
    date = getRealDate(archives[archive]["link"],archives[archive]["time"])
  File "/usr/src/app/modules/cdGetArchives.py", line 85, in getRealDate
    response = requests.get(url,headers=headers)
  File "/usr/local/lib/python3.5/site-packages/requests/api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 590, in send
    adapter = self.get_adapter(url=request.url)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 672, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'hive.org.uk/wayback/archive/20080304103855/http://www.google.com/'
runtime in seconds:  13
{
  "URI": "http://www.google.com",
  "Estimated Creation Date": "2003-01-14T00:00:00",
  "Bitly.com": "",
  "Google.com": "2003-01-14T00:00:00",
  "Bing.com": "",
  "Pubdate tag": "",
  "Last Modified": "",
  "Archives": [
    [
      "Earliest",
      ""
    ],
    [
      "By_Archive",
      {}
    ]
  ],
  "Twitter.com": "2006-04-13T02:58:51",
  "Backlinks": ""
}
@ibnesayeed
Copy link
Member

As far as I remember, there is a -e flag to exclude modules. This exception is happening because you did not provide Bitly key and did not exclude that module. However, I do understand that it needs better documentation in the README as well as the exception should be caught and gracefully handled with a more friendly message or STDERR.

/cc @DarkAngelZT

@shawnmjones
Copy link
Member Author

shawnmjones commented Oct 5, 2016

Well, I would agree, but I always get the "please set bitly access token in config" Bitly message, and do not always get this exception.

# sudo docker run --rm -it carbon ./main.py -l search http://www.cs.odu.edu
cdGetBitly.py::GetBitlyJson(), please set bitly access token in config
runtime in seconds:  8
{
  "URI": "http://www.cs.odu.edu",
  "Estimated Creation Date": "1997-03-24T17:29:34",
  "Pubdate tag": "",
  "Archives": [
    [
      "Earliest",
      "1997-03-24T17:29:34"
    ],
    [
      "By_Archive",
      {
        "http://web.archive.bibalex.org:80/web/20010414022512/http://www.cs.odu.edu/": "2001-03-23T14:55:45",
        "http://arquivo.pt/wayback/20091223043049/http://www.cs.odu.edu/": "2009-12-23T04:30:50",
        "http://web.archive.org/web/19971010201632/http://www.cs.odu.edu/": "1997-03-24T17:29:34",
        "http://archive.is/19970606105039/http://www.cs.odu.edu/": "1997-06-06T06:50:39",
        "http://webcitation.org/query?id=1327284086752784": "2012-01-22T21:01:29"
      }
    ]
  ],
  "Bitly.com": "",
  "Backlinks": "",
  "Last Modified": "",
  "Twitter.com": "2008-12-01T08:53:27",
  "Google.com": "2015-06-02T00:00:00",
  "Bing.com": ""
}

Instead, as mentioned in the requests.exceptions.InvalidSchema exception thrown by the Python requests module, the problem appears to be that something along the way discovered a memento at URI hive.org.uk/wayback/archive/20080304103855/http://www.google.com/ and this URI does not have a scheme (i.e., no "http://", "https://", etc.), causing the requests.get on line 85 of modules/cdGetArchives.py to fail.

@ibnesayeed
Copy link
Member

Well, which means some sanity check needs to be placed (and fixed) before making the requests call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants