Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New tool to suppress leading and trailing spaces. #783

Merged
merged 1 commit into from May 14, 2019
Merged

Conversation

SNoiraud
Copy link
Member

@SNoiraud SNoiraud commented Feb 3, 2019

Fixes #10824

This tool is looking for place names with leading or/and trailing spaces.
It also looks for in the first name and surname.

For each entry which contains leading or trailing spaces,
a row is added in a treeview.
You can see where the spaces are for each row as the name is underlined.

If you double click on the row, you can edit the Place or the Person.
For places, I search for comma. They are forbidden for lat/lon (related to (https://gramps-project.org/bugs/view.php?id=11114)

@SNoiraud SNoiraud added this to the v5.1 milestone Feb 3, 2019
@SNoiraud
Copy link
Member Author

SNoiraud commented Feb 3, 2019

fr10824 1

@codecov-io
Copy link

codecov-io commented Feb 3, 2019

Codecov Report

Merging #783 into master will decrease coverage by 0.05%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #783      +/-   ##
==========================================
- Coverage   42.05%      42%   -0.06%     
==========================================
  Files        1065     1066       +1     
  Lines      144427   144625     +198     
==========================================
  Hits        60743    60743              
- Misses      83684    83882     +198
Impacted Files Coverage Δ
gramps/plugins/tool/removespaces.py 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6bca408...7712449. Read the comment docs.

@Nick-Hall
Copy link
Member

Do you plan to keep the handle column?

@SNoiraud
Copy link
Member Author

SNoiraud commented Feb 3, 2019

I could hide it. I need it to call the editor.

@prculley
Copy link
Contributor

prculley commented Feb 5, 2019

If you want to get this out to users anytime soon, you might consider packaging it as an addon. The way thing are going, I don't think we will see Gramps5.1 anytime this year.

@Nick-Hall
Copy link
Member

The way thing are going, I don't think we will see Gramps5.1 anytime this year.

We will probably release v5.1 in July or August.

@lordemannd
Copy link
Contributor

Can it also look at Alternate Names, Longitude, Latitude? (really any user enter-able text field)

@lordemannd
Copy link
Contributor

I added the newest changes to the 5.0.1 version on windows and get the error reporting wizard.
Is this a GTK version issue? Should I just wait for 5.1?

8609344: ERROR: tool.py: line 257: Failed to start tool.
Traceback (most recent call last):
File "C:\Program Files\GrampsAIO64-5.0.1-WORK\gramps\gui\plug\tool.py", line 253, in gui_tool
callback = callback)
File "C:\Program Files\GrampsAIO64-5.0.1-WORK\gramps\plugins\tool\removespaces.py", line 77, in init
top_dialog = Glade()
File "C:\Program Files\GrampsAIO64-5.0.1-WORK\gramps\gui\glade.py", line 158, in init
self.add_from_string(data)
File "AIO/gi/overrides/Gtk.py", line 464, in add_from_string
GLib.GError: gtk-builder-error-quark: .:125:58 Invalid property: GtkScrolledWindow.max_content_height (11)

  • Python 3.6.4 (Success version 3.2.0 or greater installed.)
  • GTK+ 3.18.9 (Success version 3.10.0 or greater installed.)
  • PyGObject 3.26.1 (Success version 3.12.0 or greater installed.)

@Nick-Hall
Copy link
Member

The max_content_height property requires Gtk 3.22 and above. We decided on Gtk v3.12 for Gramps v5.1.

@lordemannd
Copy link
Contributor

lordemannd commented Apr 10, 2019

After "Come back to Gtk 3.12 : remove max_content_height"
I get the following error. (I have checked and it looks like I have all the changes)

279416: ERROR: tool.py: line 257: Failed to start tool.
Traceback (most recent call last):
File "C:\Program Files\GrampsAIO64-5.0.1-WORK\gramps\gui\plug\tool.py", line 253, in gui_tool
callback = callback)
File "C:\Program Files\GrampsAIO64-5.0.1-WORK\gramps\plugins\tool\removespaces.py", line 77, in init
top_dialog = Glade()
File "C:\Program Files\GrampsAIO64-5.0.1-WORK\gramps\gui\glade.py", line 158, in init
self.add_from_string(data)
File "AIO/gi/overrides/Gtk.py", line 464, in add_from_string
GLib.GError: gtk-builder-error-quark: .:136:67 Invalid property: GtkScrolledWindow.propagate_natural_width (11)

@SNoiraud
Copy link
Member Author

It's difficult to know all properties glade uses. I am on Ubuntu with Gtk 3.22.
I hope I forgot nothing else.

@lordemannd
Copy link
Contributor

Thanks, its working now. I will do some testing.

@lordemannd
Copy link
Contributor

It doesn't seem to be checking the alternate surname or given names.

@SNoiraud
Copy link
Member Author

Can you attach a database example (.gramps) for that.

@lordemannd
Copy link
Contributor

lordemannd commented Apr 12, 2019

Using current Gramps, the process of backup (.gramp) and then import fixes many of these issues.
I tried using other export methods, they also result in the data being corrected.

The specific issue in my test was caused originally by cutting/pasting names from web sources. When doing this, sometimes the copied text includes control characters (carriage return, line feed, tab, vtab, bell, etc.) that are considered "white space".

In older versions of Gramps these control characters where not displayed, making it nearly impossible to find them. (but causing problems in many generated reports )
Current Gramps makes them visible.

On Windows, I can recreate the problem in alternate names easily.
(my apology for the painful detail, just trying to be clear)

  • in Gramps, add a new name to a person to open the name editor.
  • Windows, open the notepad application
  • add 2 lines of text.
  • move cursor to the end of the 1st line
  • left click and hold button
  • drag cursor to end of 2nd line
  • release button
    you should now have the 2nd line of text highlighted (including the invisible end of the 1st line)
  • right click and choose copy
  • return to Gramps and paste the text into the Given name and Surname field.
  • click OK button to save this alternate name.
  • run the space removal tool.
  • notice that this alternate name is not found.
  • now make the alternate name the default name
  • rerun the tool
  • notice that as the default name, the issue is detected.

@SNoiraud
Copy link
Member Author

SNoiraud commented Apr 13, 2019

run the space removal tool.
notice that this alternate name is not found.

It works for me. We have no alternate name here. The Given and Surname names are displayed.

now make the alternate name the default name

It works.

I attached the .gramps file for which it works. I manually edit to add the new line.
when I import it, this new line is removed only for the Given name.

So, my question now is:
Why given names are processed differently than surname or alternate names?

Perhaps solving this kind of problem would be sufficient and make this tool obsolete.

remove .txt in the file name before import it.
PR783.gramps.txt

@lordemannd
Copy link
Contributor

lordemannd commented Apr 13, 2019

In your PR783.gramps.txt it looks like the "newline" (\n) is in the middle of the name text. Did you try the "newline" character as the 1st or last character in the name?

@SNoiraud
Copy link
Member Author

SNoiraud commented Apr 13, 2019

Did you try the "newline" character as the 1st or last character in the name?

Yes. in the provided file, given, surname and alternate begin with a new line.

@SNoiraud
Copy link
Member Author

SNoiraud commented Apr 13, 2019

I have a patch to suppress all invisible characters in the following fields:
call_name, nick_name, family_nick_name, first_name, suffix, title, surname and prefix

I think this is a better solution

try_1.patch.txt

with this patch, this tool is obsolete for the Person.

@SNoiraud
Copy link
Member Author

The best is perhaps to modify the MonitoredEntry class.

@SNoiraud
Copy link
Member Author

This is the new patch for the MonitoredEntry.
It is now impossible to enter
line feed, carriage return, tab, ... in a field

try_2.patch.txt

@Nick-Hall
Copy link
Member

This seems to be progressing well.

@SNoiraud
Copy link
Member Author

I think we do not need this tool if we modify the MonitoredEntry with the try_2 patch.
It works on all views: people, events, places, media, sources and repositories

I close this PR. I'll create a new one. Can it be a bug for gramps 5.0 ?

@lordemannd
Copy link
Contributor

it can be a bug if the we think the Gramps specification says: leading and trailing white space are not allowed in field text and will be removed.

also saying that white space is any non-printing character to include space, nl, cr, tab, vtab, bell, etc.
(i personally like this and would further say that all white space strings within the text should be replaced with a single space)

Does this apply directly for text blocks like in a Note? or should blocks of text have different rules?

BUT i think this tool is still needed:
1 - in case there are bugs in the bug fix
2 - for users that are upgrading from a previous version (like 3.4 to 5.0), they will need to be able to clean up things that older Gramps versions allowed. The upgrade I refer to is where an existing BSD DB is converted in place instead of the backup and import method.

@Nick-Hall
Copy link
Member

Can it be a bug for gramps 5.0 ?

Yes, I think that allowing control characters to be pasted into an entry field could be considered a bug. Please open another PR.

We probably should allow a non-breaking space because it is used to indicate that multi-part names in Arabic and is used by the Given Names Cloud gramplet. However, you only seem to be removing control codes in the ASCII range 0-31.

@Nick-Hall
Copy link
Member

Does this apply directly for text blocks like in a Note?

Notes are handled differently and may contain whitespace.

@lordemannd
Copy link
Contributor

for Arabic, is the non-breaking space ever the 1st or last character in the field?

@SNoiraud
Copy link
Member Author

SNoiraud commented Apr 15, 2019

Please open another PR.

See #811 for gramps 5.0

@Nick-Hall
Copy link
Member

@SNoiraud What was the decision about this? Do we still need the tool?

@SNoiraud
Copy link
Member Author

What was the decision about this? Do we still need the tool?

I think we need this tool after an import.
Perhaps the problem is without the name: this plugin also replace commas and not only leading and trailing spaces.
What could be the name? (standardization of input fields ?)

@Nick-Hall
Copy link
Member

What could be the name? (standardization of input fields ?)

I suppose it is a form of data cleansing, so something like "Clean input data"?

@Nick-Hall
Copy link
Member

@SNoiraud Is this ready to merge?

@SNoiraud
Copy link
Member Author

We can merge it.

This tool is looking for place names with leading or/and trailing spaces.
It also looks for in the first name and surname.

For each entry which contains leading or trailing spaces,
a row is added in a treeview.
You can see where the spaces are for each row as the name is underlined.

If you double click on the row, you can edit the Place or the Person.

Fixes #10824
@sam-m888
Copy link
Member

sam-m888 commented Jan 21, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants