Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add http/https schema correctly #242

Merged
merged 5 commits into from Jan 17, 2019
Merged

Conversation

gilv
Copy link
Contributor

@gilv gilv commented Oct 11, 2018

Sometimes host provided to the method read_csv() may already starts with http or https.
However existing code always adds "http://", this makes invalid host if http or https was part of host url, making it http://https:// or http://https://

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @gilv, thanks for PR

Please also add a test for it.

@@ -615,3 +615,8 @@ def _encoding_wrapper(fileobj, mode, encoding=None, errors=DEFAULT_ERRORS):
else:
decoder = codecs.getwriter(encoding)
return decoder(fileobj, errors=errors)

def _add_sheme_to_host(host):
if (host.startswith('http')):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary outer brackets, please remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if the host is httpsomething.com, what will happen? Probably you need to check "http://" instead.

@gilv
Copy link
Contributor Author

gilv commented Oct 11, 2018

@menshikh-iv added tests and addressed the comments. thanks

class Test(unittest.TestCase):


def test_host_name(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a suitable place for the test, this isn't an integration test, please add your test to existing files in smart_open/tests/

@menshikh-iv
Copy link
Contributor

Thanks @gilv, @mpenkov wdyt? Can we merge this?

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, needs some minor changes.

@@ -615,3 +615,8 @@ def _encoding_wrapper(fileobj, mode, encoding=None, errors=DEFAULT_ERRORS):
else:
decoder = codecs.getwriter(encoding)
return decoder(fileobj, errors=errors)

def _add_sheme_to_host(host):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/sheme/scheme

host = 'http://a.com/b'
expected = 'http://a.com/b'
self.assertTrue(expected == smart_open_lib._add_sheme_to_host(host))
host = 'a.com/b'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these should be a separate test case. Putting multiple checks into a single test case is bad practice, because one failure can mask subsequent ones.

def test_host_name(self):
host = 'http://a.com/b'
expected = 'http://a.com/b'
self.assertTrue(expected == smart_open_lib._add_sheme_to_host(host))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to store the result of the function in a separate variable use .assertEquals(expected, actual).

That way, if the assert fails, the test runner will show you both the expected and actual values, and how they differ. That will save you time debugging.

If you just use .assertTrue, the test runner will just show you when that value is False, and you'll have to debug your code to work out how exactly the comparison failed.

@mpenkov
Copy link
Collaborator

mpenkov commented Oct 12, 2018

@menshikh-iv I guess we can merge after the minor issues are fixed. I think it may be simpler to deal with the kwargs merge first, because the two will (may?) conflict.

@menshikh-iv
Copy link
Contributor

Agree @mpenkov

@gilv please fix last comments and we'll merge your PR after #230

@gilv
Copy link
Contributor Author

gilv commented Oct 12, 2018

@menshikh-iv Done

@gilv
Copy link
Contributor Author

gilv commented Dec 9, 2018

@mpenkov @menshikh-iv why it's not merged yet?

@menshikh-iv
Copy link
Contributor

So, looks like #230 in slow dev, in that case, I merge current one, thank you @gilv, sorry for late answer.

@menshikh-iv menshikh-iv changed the title endpoint url might contains http or https Add http/https schema correctly Jan 17, 2019
@menshikh-iv menshikh-iv merged commit 55bb2c5 into piskvorky:master Jan 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants