Skip to content
Please note that GitHub no longer supports Internet Explorer.

We recommend upgrading to the latest Microsoft Edge, Google Chrome, or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove global initializers #476

Open
wants to merge 5 commits into
base: master
from

Conversation

@GuillaumeGomez
Copy link

GuillaumeGomez commented Jan 21, 2020

Fixes #475.

Since it changes the API, it'll require to change the version from "1.*" to "2.0".

With this, no more global variables with mutable accesses (unless I missed one?). The remaining ones don't seem to have mutable access so I think it's fine.

The first commit is very big, the others are much smaller.

I also took the liberty to add a "libpostal_get_version" function. It might come in handy at some point.

If you have any question, don't hesitate to ask!

@GuillaumeGomez

This comment has been minimized.

Copy link
Author

GuillaumeGomez commented Jan 21, 2020

@GuillaumeGomez GuillaumeGomez force-pushed the GuillaumeGomez:remove-global-initializers branch from 76d2172 to 2a6fcb6 Jan 21, 2020
@GuillaumeGomez

This comment has been minimized.

Copy link
Author

GuillaumeGomez commented Jan 21, 2020

Weird CI errors like:

gcc-4.8: error: unrecognized command line option ‘-Wdate-time’

Is it expected?

Copy link
Contributor

AeroXuk left a comment

Will try to find time to go through the whole change at the weekend.
For now, the change looks positive to me.
I have a suggestion about just having the instance parameter passed into the functions rather than two parameters added to some methods.

libpostal_t *instance = libpostal_setup();
language_classifier_t *classifier = libpostal_setup_language_classifier();
if (instance == NULL || classifier == NULL) {
Comment on lines +13 to +15

This comment has been minimized.

Copy link
@AeroXuk

AeroXuk Jan 24, 2020

Contributor

Suggest for language_classifier_t *classifier to be a property of libpostal_t.

Suggested change
libpostal_t *instance = libpostal_setup();
language_classifier_t *classifier = libpostal_setup_language_classifier();
if (instance == NULL || classifier == NULL) {
libpostal_t *instance = libpostal_setup();
if (instance == NULL || !libpostal_setup_language_classifier(instance)) {

I think having a single libpostal struct/object to pass into the functions would be more convenient to the user.
Checking whether the classifier or parser have been initialised can then be a check for a null pointer.
libpostal_teardown can then also check and free the classifier or parser if they haven't been freed.

Next two suggestions are to illustrate this way.

This comment has been minimized.

Copy link
@GuillaumeGomez

GuillaumeGomez Jan 24, 2020

Author

This seems like a good point. language_classifier_t functions appear to only perform non-mutable operations so storing this object into libpostal_t is definitely worth it. Updating the code then!

This comment has been minimized.

Copy link
@GuillaumeGomez

GuillaumeGomez Jan 24, 2020

Author

Actually, that doesn't sound that good: in the current configuration, you can setup multiple classifiers and also, it "forces" user to call the method in order to get the "language_classifier_t" instance.

This comment has been minimized.

Copy link
@GuillaumeGomez

GuillaumeGomez Jan 24, 2020

Author

To extend a bit my last comment: we could maybe go the other way around: wrapping a libpostal_t instance into language_classifier_t, but then, to be safe, we'd need to add back some globals or shared variables to be sure that libpostal_t isn't freed before language_classifier_t.

The API looks a bit less nice, I definitely agree with you on this point, but I don't see any good enough solution to make this work without big downsides.

This comment has been minimized.

Copy link
@AeroXuk

AeroXuk Jan 24, 2020

Contributor

Users can still setup multiple classifiers if they have separate instances of libpostal.
In regards to forcing users to call the setup, the library already requires users to call extra setup functions for the larger modules.

I would try to have only a single extra instance parameter as people upgrading from previous versions of libpostal to an updated one will want the minimum amount of code refactoring.
Being that the project has the aim of being easy to use (shipping with pre-trained model), we should aim to have a minimal impact on people already implementing the library.

This comment has been minimized.

Copy link
@GuillaumeGomez

GuillaumeGomez Jan 24, 2020

Author

This is a fair point. Well, I warned about it. :) I'll make the change then but I really don't think this is a good idea.

Also, the bindings have less than 50 functions to bind so it's pretty simple to update. ;)

This comment has been minimized.

Copy link
@GuillaumeGomez

GuillaumeGomez Jan 24, 2020

Author

I just can't keep stop thinking about it: the API isn't great. For example: libpostal_place_languages is taking two char**. Which doesn't make sense since the first one is suppose to "match" the second, why not an array of element { label: char*, value: char* }?

So anyway, the changes remain small, even if we keep the language_classifier_t type independant. So why not going to the bottom of this change and prevent all possible data races?

This comment has been minimized.

Copy link
@AeroXuk

AeroXuk Jan 24, 2020

Contributor

I just can't keep stop thinking about it: the API isn't great. For example: libpostal_place_languages is taking two char**. Which doesn't make sense since the first one is suppose to "match" the second, why not an array of element { label: char*, value: char* }?

I think a lot of the previous API decisions have been based around the language bindings. I'm going to assume for some language bindings it's easier to have separate labels and values arrays then convert them into a language native datatype.

I'll try to go through the rest of this change request tomorrow and give any feedback on other areas. I think this having an instanced API is a great step forward. And will help the project towards being thread-safe.

This comment has been minimized.

Copy link
@GuillaumeGomez

GuillaumeGomez Jan 24, 2020

Author

Ok, we can go back to this discussion once you're done. But I'd really prefer to keep it as is for the previously mentioned reasons.

@@ -108,7 +110,7 @@ int main(int argc, char **argv) {
char **values = cstring_array_to_strings(values_array);

size_t num_near_dupe_hashes = 0;
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes);
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(classifier, instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes);

This comment has been minimized.

Copy link
@AeroXuk

AeroXuk Jan 24, 2020

Contributor
Suggested change
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(classifier, instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes);
char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes);
libpostal_teardown();
libpostal_teardown_language_classifier();
libpostal_teardown(&instance);
libpostal_teardown_language_classifier(&classifier);

This comment has been minimized.

Copy link
@AeroXuk

AeroXuk Jan 24, 2020

Contributor
Suggested change
libpostal_teardown_language_classifier(&classifier);
libpostal_teardown_language_classifier(instance);
libpostal_teardown(&instance);
@AeroXuk

This comment has been minimized.

Copy link
Contributor

AeroXuk commented Jan 24, 2020

Weird CI errors like:

gcc-4.8: error: unrecognized command line option ‘-Wdate-time’

Is it expected?

There are generally quite a few warnings from the CI systems.
The failure on travis is due to something else not this change.

@GuillaumeGomez

This comment has been minimized.

Copy link
Author

GuillaumeGomez commented Jan 24, 2020

I'll check more in depth what's going on and what did I broke then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.