-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[do not merge] refactoring #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely like that you
- (re)moved unused code
- used a code formatter
- moved a lot of utility functions out of the main code
The few remarks I made are maybe out of scope for this refactoring, because these refactoring are mostly pure refactoring, i.e. reformatting and moving code.
We should have a deeper look after this. Let's talk about it on Friday?
@click.option( | ||
"--save_layout", | ||
"-sl", | ||
help="if a directory is given, plot of layout will be saved there", | ||
type=click.Path(exists=True, file_okay=False), | ||
) | ||
@click.option( | ||
"--save_deskewed", | ||
"-sd", | ||
help="if a directory is given, deskewed image will be saved there", | ||
type=click.Path(exists=True, file_okay=False), | ||
) | ||
@click.option( | ||
"--save_all", | ||
"-sa", | ||
help="if a directory is given, all plots needed for documentation will be saved there", | ||
type=click.Path(exists=True, file_okay=False), | ||
) | ||
@click.option( | ||
"--allow_enhancement", | ||
"-ae", | ||
is_flag=True, | ||
help="if this parameter set to true, this tool would check that input image need resizing and enhancement or not. If so output of resized and enhanced image and corresponding layout data will be written in out directory", | ||
) | ||
@click.option( | ||
"--curved_line", | ||
"-cl", | ||
is_flag=True, | ||
help="if this parameter set to true, this tool will try to return contoure of textlines instead of rectabgle bounding box of textline. This should be taken into account that with this option the tool need more time to do process.", | ||
) | ||
@click.option( | ||
"--full_layout", | ||
"-fl", | ||
is_flag=True, | ||
help="if this parameter set to true, this tool will try to return all elements of layout.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that was in there before, but I like that these are proper options!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't, I fixed this in 97bc57b @vahidrezanezhad you can define command line options as "flags", so they represent boolean values which makes the code cleaner because you don't need to compare strings "true"
, "TRUE"
etc. if defined like this.
from .utils.contour import ( | ||
contours_in_same_horizon, | ||
filter_contours_area_of_image_interiors, | ||
filter_contours_area_of_image_tables, | ||
filter_contours_area_of_image, | ||
find_contours_mean_y_diff, | ||
find_features_of_contours, | ||
find_new_features_of_contoures, | ||
get_text_region_boxes_by_given_contours, | ||
get_textregion_contours_in_org_image, | ||
return_bonding_box_of_contours, | ||
return_contours_of_image, | ||
return_contours_of_interested_region, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that these (and other) utility functions are now moved out of the main code!
full_layout=False, | ||
allow_scaling=False, | ||
headers_off=False | ||
): | ||
self.image_dir = image_dir # XXX This does not seem to be a directory as the name suggests, but a file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed the problem that image_dir
is not a directory, in my initial review of sbb-textline-detector. It would be nice if that problem could be resolved too, i.e. renaming it to image_file
or similar. (That XXX
is from me and is now in multiple variants of the code.)
That is, if it is still the case that image_dir
is not a directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. The code would benefit a lot from more consistent naming of variables and functions in general. But best to do that in a separate PR.
if self.f_name is None: | ||
try: | ||
self.f_name = image_dir.split('/')[len(image_dir.split('/')) - 1] | ||
self.f_name = self.f_name.split('.')[0] | ||
self.f_name = image_dir.split("/")[len(image_dir.split("/")) - 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be clearer as:
self.f_name = image_dir.split("/")[-1]
or even:
self.f_name = os.path.basename(image_dir)
(Here again the misleading name of image_dir
, I think it should be image_file
or image_path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, also I would use os.path.join
or Path()
for concatenating paths throughout.
sbb_newspapers_org_image/eynollah.py
Outdated
self.model_dir_of_enhancemnet = dir_models + "/model_enhancement.h5" | ||
self.model_dir_of_col_classifier = dir_models + "/model_scale_classifier.h5" | ||
self.model_region_dir_p = dir_models + "/model_main_covid19_lr5-5_scale_1_1_great.h5" # dir_models +'/model_main_covid_19_many_scalin_down_lr5-5_the_best.h5'#'/model_main_covid19_lr5-5_scale_1_1_great.h5'#'/model_main_scale_1_1und_1_2_corona_great.h5' | ||
# self.model_region_dir_p_ens = dir_models +'/model_ensemble_s.h5'#'/model_main_covid19_lr5-5_scale_1_1_great.h5'#'/model_main_scale_1_1und_1_2_corona_great.h5' | ||
self.model_region_dir_p2 = dir_models + "/model_main_home_corona3_rot.h5" | ||
|
||
self.model_region_dir_fully_np = dir_models + "/model_no_patches_class0_30eopch.h5" | ||
self.model_region_dir_fully = dir_models + "/model_3up_new_good_no_augmentation.h5" # "model_3col_p_soft_10_less_aug_binarization_only.h5" | ||
|
||
self.model_page_dir = dir_models + "/model_page_mixed_best.h5" | ||
self.model_region_dir_p_ens = dir_models + "/model_ensemble_s.h5" # dir_models +'/model_main_covid_19_many_scalin_down_lr5-5_the_best.h5' #dir_models +'/model_ensemble_s.h5' | ||
###self.model_region_dir_p = dir_models +'/model_layout_newspapers.h5'#'/model_ensemble_s.h5'#'/model_layout_newspapers.h5'#'/model_ensemble_s.h5'#'/model_main_home_5_soft_new.h5'#'/model_home_soft_5_all_data.h5' #'/model_main_office_long_soft.h5'#'/model_20_cat_main.h5' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe dir
is misleading here too. Also, os.path.join
could be used to join a directory name with a file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the effort @kba, makes it a joy (almost) to review ;)
Mostly minor remarks and suggestions from my side, possibly also to target in a separate PR, or for discussion in the next meeting.
|
||
__version__ = '1.0' | ||
""" | ||
tool to extract table form data from alto xml data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DocString should be updated and start with a capital letter and end with a fullstop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 and offer more details. And we should reuse that in the docstring for the CLI and upcoming OCR-D CLI.
self.model_textline_dir = dir_models + "/model_textline_newspapers.h5" #'/model_hor_ver_home_trextline_very_good.h5'# '/model_hor_ver_1_great.h5'#'/model_curved_office_works_great.h5' | ||
|
||
def predict_enhancement(self, img): | ||
model_enhancement, session_enhancemnet = self.start_new_session_and_model(self.model_dir_of_enhancemnet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can consider renaming some of the models or making the model names more consistent (good/best/very good/great) @vahidrezanezhad
sbb_newspapers_org_image/eynollah.py
Outdated
|
||
else: | ||
peaks_new_tot=peaks_e[:] | ||
data.set("xmlns", "http://schema.primaresearch.org/PAGE/gts/pagecontent/2017-07-15") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update this to PAGE-XML 2019 (http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15)?
(also L1459)
sbb_newspapers_org_image/eynollah.py
Outdated
else: | ||
point_up = peaks[jj] + first_nonzero - int(1.23 * dis_to_next_up) ##+int(dis_to_next_up*1./4.0) | ||
point_down = peaks[jj] + first_nonzero + int(1.33 * dis_to_next_down) ###-int(dis_to_next_down*1./4.0) | ||
data.set("xmlns", "http://schema.primaresearch.org/PAGE/gts/pagecontent/2017-07-15") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again update to PAGE-XML 2019 when possible (same for L1966)
(fixes #6)
…to refactor # Conflicts: # sbb_newspapers_org_image/eynollah.py
Extends #1 with some initial refactorings:
self
) to utils.pyI'll keep it at that for now because I don't want to inadvertently break anything, so further refactoring must wait until tests/CI are set up.