Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upRemove global initializers #476
Conversation
This comment has been minimized.
This comment has been minimized.
|
cc @AeroXuk |
76d2172
to
2a6fcb6
This comment has been minimized.
This comment has been minimized.
|
Weird CI errors like:
Is it expected? |
|
Will try to find time to go through the whole change at the weekend. |
| libpostal_t *instance = libpostal_setup(); | ||
| language_classifier_t *classifier = libpostal_setup_language_classifier(); | ||
| if (instance == NULL || classifier == NULL) { |
This comment has been minimized.
This comment has been minimized.
AeroXuk
Jan 24, 2020
Contributor
Suggest for language_classifier_t *classifier to be a property of libpostal_t.
| libpostal_t *instance = libpostal_setup(); | |
| language_classifier_t *classifier = libpostal_setup_language_classifier(); | |
| if (instance == NULL || classifier == NULL) { | |
| libpostal_t *instance = libpostal_setup(); | |
| if (instance == NULL || !libpostal_setup_language_classifier(instance)) { |
I think having a single libpostal struct/object to pass into the functions would be more convenient to the user.
Checking whether the classifier or parser have been initialised can then be a check for a null pointer.
libpostal_teardown can then also check and free the classifier or parser if they haven't been freed.
Next two suggestions are to illustrate this way.
This comment has been minimized.
This comment has been minimized.
GuillaumeGomez
Jan 24, 2020
Author
This seems like a good point. language_classifier_t functions appear to only perform non-mutable operations so storing this object into libpostal_t is definitely worth it. Updating the code then!
This comment has been minimized.
This comment has been minimized.
GuillaumeGomez
Jan 24, 2020
Author
Actually, that doesn't sound that good: in the current configuration, you can setup multiple classifiers and also, it "forces" user to call the method in order to get the "language_classifier_t" instance.
This comment has been minimized.
This comment has been minimized.
GuillaumeGomez
Jan 24, 2020
Author
To extend a bit my last comment: we could maybe go the other way around: wrapping a libpostal_t instance into language_classifier_t, but then, to be safe, we'd need to add back some globals or shared variables to be sure that libpostal_t isn't freed before language_classifier_t.
The API looks a bit less nice, I definitely agree with you on this point, but I don't see any good enough solution to make this work without big downsides.
This comment has been minimized.
This comment has been minimized.
AeroXuk
Jan 24, 2020
•
Contributor
Users can still setup multiple classifiers if they have separate instances of libpostal.
In regards to forcing users to call the setup, the library already requires users to call extra setup functions for the larger modules.
I would try to have only a single extra instance parameter as people upgrading from previous versions of libpostal to an updated one will want the minimum amount of code refactoring.
Being that the project has the aim of being easy to use (shipping with pre-trained model), we should aim to have a minimal impact on people already implementing the library.
This comment has been minimized.
This comment has been minimized.
GuillaumeGomez
Jan 24, 2020
Author
This is a fair point. Well, I warned about it. :) I'll make the change then but I really don't think this is a good idea.
Also, the bindings have less than 50 functions to bind so it's pretty simple to update. ;)
This comment has been minimized.
This comment has been minimized.
GuillaumeGomez
Jan 24, 2020
Author
I just can't keep stop thinking about it: the API isn't great. For example: libpostal_place_languages is taking two char**. Which doesn't make sense since the first one is suppose to "match" the second, why not an array of element { label: char*, value: char* }?
So anyway, the changes remain small, even if we keep the language_classifier_t type independant. So why not going to the bottom of this change and prevent all possible data races?
This comment has been minimized.
This comment has been minimized.
AeroXuk
Jan 24, 2020
Contributor
I just can't keep stop thinking about it: the API isn't great. For example:
libpostal_place_languagesis taking two char**. Which doesn't make sense since the first one is suppose to "match" the second, why not an array ofelement { label: char*, value: char* }?
I think a lot of the previous API decisions have been based around the language bindings. I'm going to assume for some language bindings it's easier to have separate labels and values arrays then convert them into a language native datatype.
I'll try to go through the rest of this change request tomorrow and give any feedback on other areas. I think this having an instanced API is a great step forward. And will help the project towards being thread-safe.
This comment has been minimized.
This comment has been minimized.
GuillaumeGomez
Jan 24, 2020
Author
Ok, we can go back to this discussion once you're done. But I'd really prefer to keep it as is for the previously mentioned reasons.
| @@ -108,7 +110,7 @@ int main(int argc, char **argv) { | |||
| char **values = cstring_array_to_strings(values_array); | |||
|
|
|||
| size_t num_near_dupe_hashes = 0; | |||
| char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes); | |||
| char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(classifier, instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes); | |||
This comment has been minimized.
This comment has been minimized.
AeroXuk
Jan 24, 2020
Contributor
| char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(classifier, instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes); | |
| char **near_dupe_hashes = libpostal_near_dupe_hashes_languages(instance, num_components, labels, values, options, num_languages, languages, &num_near_dupe_hashes); |
| libpostal_teardown(); | ||
| libpostal_teardown_language_classifier(); | ||
| libpostal_teardown(&instance); | ||
| libpostal_teardown_language_classifier(&classifier); |
This comment has been minimized.
This comment has been minimized.
AeroXuk
Jan 24, 2020
Contributor
| libpostal_teardown_language_classifier(&classifier); | |
| libpostal_teardown_language_classifier(instance); | |
| libpostal_teardown(&instance); |
This comment has been minimized.
This comment has been minimized.
There are generally quite a few warnings from the CI systems. |
This comment has been minimized.
This comment has been minimized.
|
I'll check more in depth what's going on and what did I broke then. |
GuillaumeGomez commentedJan 21, 2020
•
edited
Fixes #475.
Since it changes the API, it'll require to change the version from "1.*" to "2.0".
With this, no more global variables with mutable accesses (unless I missed one?). The remaining ones don't seem to have mutable access so I think it's fine.
The first commit is very big, the others are much smaller.
I also took the liberty to add a "libpostal_get_version" function. It might come in handy at some point.
If you have any question, don't hesitate to ask!