Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export Plaintext Backup sometimes creates invalid XML #342

Closed
hoozey opened this issue Sep 8, 2013 · 17 comments
Closed

Export Plaintext Backup sometimes creates invalid XML #342

hoozey opened this issue Sep 8, 2013 · 17 comments

Comments

@hoozey
Copy link

hoozey commented Sep 8, 2013

For some messages, the 'body' attribute in the xml file doesn't get closed. This causes "Error importing backup!" when trying to import.

Example of bad XML:

<sms protocol="0" address="5558675309" date="1378522000531" type="1" subject="null" body="Hello world  toa="null" sc_toa="null" service_center="null" read="1" status="0" locked="0" />
@hoozey hoozey closed this as completed Sep 8, 2013
@hoozey hoozey reopened this Sep 8, 2013
@scw
Copy link
Contributor

scw commented Oct 3, 2013

@hoozey can you provide an example body which causes this? I see a catch for IllegalArgumentException which seems to get triggered in the body statement, but an example message that causes the bad serialization would still be helpful.

@hoozey
Copy link
Author

hoozey commented Oct 5, 2013

One malformed message I ran into was

body="No  toa="null"

Although I have some perfectly fine ones like this:

body="No" toa="null"

Not sure if there was a trailing space in the original message.

I also have a lot of blank ones:

body=" toa="null"

@kalaxdg
Copy link

kalaxdg commented Nov 2, 2013

Hi there,

I'm not able to export plaintext at all. Anyone know why that might be? I hit the export button, phone (Galaxy Nexus) works for about 20 seconds and then says "success!" I would expect my phone to take at least several minutes creating a backup. Is there a way that I can fix the issue without losing my archive?

DorianScholz added a commit to DorianScholz/TextSecure that referenced this issue Mar 2, 2014
Make an import dry run after exporting to check for corrupted xml.
This does not yet fix the root cause of signalapp#342, but at least lets the user know about the faulty export.
@DorianScholz
Copy link
Contributor

Just got bitten by this:

body="Ja  toa="null"

Only one entry was actually corrupted amongst 2300 good ones.
And this was one of about 10 messages that were sent using TextSecure. All others were imported from the SMS app.

Not sure yet what exactly the problem is, so pull request #929 only makes sure the user gets notified about the problem when exporting and not only when importing, because this might be too late...

DorianScholz added a commit to DorianScholz/TextSecure that referenced this issue Mar 2, 2014
Fixes signalapp#342 by replacing invalid unicode characters in the message body with spaces.
@DorianScholz
Copy link
Contributor

So, I think I have found the problem.
Icons are represented in the message body by unicode chars in an invalid range according to:
https://android.googlesource.com/platform/libcore/+/master/xml/src/main/java/org/kxml2/io/KXmlSerializer.java line 128:

boolean valid = (c >= 0x20 && c <= 0xd7ff) || (c >= 0xe000 && c <= 0xfffd);

I've added a commit to #929 to replace these chars by spaces before export.
This removes the icons, but preserves at least the text of the messages in the plain text backups.

moxie0 pushed a commit that referenced this issue Jun 23, 2014
Fixes #342

- using regex pattern/matcher to escape chars below 0x0020 and
  above 0xd7ff
- using String.Replace to escape XML entities
- changed XmlPullParser from Xml.newPullParser() to
  XmlPullParserFactory parser to fix import on GB
@moxie0 moxie0 closed this as completed in d429f91 Jun 24, 2014
@MacaGovani
Copy link

I had the probelm with TexSecure 2.1 and this new 2.3.3 still has the problem. and its not fun, my job requires i sometimes have to back up comminications of a he-said/she-said .
body="pt toa=" I found this, by search with notepad, from the hints of above user/developers in this thread, with the search string xxtoaz where the "xx" are two spaces , and the "z" is a "=" (equal sign)
shown anohter way with an exatra quote mark on each end
" toa=""

@MacaGovani
Copy link

Youwser, lots of manual fixing for me to do.. this XML is 4966 messages.
about 200 messages have body=' (the single quote marl) the message is also in each case very long. and at the end of the body, the error is repeated with a ' single quote.

one of the bad messages also was like this
body='hello all good citizens' ?' toa= (see the exta space, extra question mark, and extra single quote mark)

@McLoo
Copy link
Contributor

McLoo commented Dec 12, 2014

@MacaGovani can you post a sample message that leads to a wrong XML file?

@MacaGovani
Copy link

Hi,

sorry for the delay,

the original file is 4966 messages
I scrubbed about 95% of the messages by count from this file. as a lot
of is private in nature.

the phone is Samsung galazyII

this website had a sample sms-xml file (also attached)
http://android.riteshsahu.com/apps/sms-backup-restore

go to the ends of the lines (one line per sms message) and note that
this sms has several more variables than the file I get from textsecure

and this zip file, from teh same web site, if it may be of use, has some
same style sheets that can help show sms-xml files in excel, but excel
says it cant show my file from text secure, with a "parsing" data error.

excel does work for the sample xml file.

when i trim my textsecure .xml file down to just a hundred sms messages,
with none that have body=' (single quote), then excel can open and
view the file.

but i have not gotten the style sheets to work for me yet, I have never
used style sheets yet, and some text needs to be put in the xml file
(near the top) to call out to use the style sheets of .xsl .xsd and .css
I think the style sheets even have the unixepoc date converter.

There is also a free JAR program that can view sms-xml files (its very
crude), but it also would view the sample xml file, but not my
textsecure xml (it errors out on load attempt), even when i cut it down
to just two messages.

so then I looked at the last 3 varaibles per sms line (per message, at
the end of message) in teh sms-smaple file, and added those same
variables to my two sms lines , with dummy data, and it worked.

in theory, i suppose, one mmore thing for me to try, once i get all the
body=' messages fixed or removed (many many of them, its over a year
of texting.. 4966 messages)

and possibly use notepad.exe to find and replace command to add the
dummy varaibles and data at the end of each line.. it may work in a viewer.

bu clearly a long term fix is needed for the body='

it seems to be trigged whenever a message has used double-quote marks.

Scott

On 12/12/2014 02:08 PM, McLoo wrote:

@MacaGovani can you post a sample message that leads to a wrong XML file?


Reply to this email directly or view it on GitHub:
#342 (comment)

@novoid
Copy link

novoid commented Dec 13, 2014

I can provide examples as well.

My Emacs nXML mode notifies on malformed XML:

I get "Invalid Code" on the first "5" in following message which might indicate some kind escaping issue with Emoticons I used in this message:

body="&#55357;&#56843;"

In TextSecure, this message was basically only a single emoji. Don't know what the second number stands for.

With my current TextSecure, the issue with missing closing quotation marks seems to be gone.

Sorry, could not find out my version number in the App. But it's not 2.3.3 since I got the update suggestion right now :-)

@McLoo
Copy link
Contributor

McLoo commented Dec 13, 2014

@MacaGovani thanks for your explanation.
The fields readable_date and contact_name are optional in the xml file.

I'm pretty sure excel complains about an invalid Unicode character with error number -1072896737. (Details of XML-Error) This is because the escaping of Unicode characters. sth like &12345;

what is more of concern to me: the body=' issue
Can you post a sample message here?
Doesn't have to be a real message, just one forcing the issue.

@McLoo
Copy link
Contributor

McLoo commented Dec 13, 2014

@novoid We have to do this kind of charter escaping to keep XML Backup & Restore compatibility (see #1379 (comment))

@novoid
Copy link

novoid commented Dec 13, 2014

OK I see. My own read-in-XML-backup-and-convert-it-to-Emacs-Orgmode works with the current format. Thanks for fixing the missing quotation marks!

@MacaGovani
Copy link

sample of actual text-secure output, with some of the message lines, with
body=' ,
rather than expected
body="

appears to have root cause of message with quotes in the message.
here is a 6 sms message sample xml (if there is a fix already published, where is DL , on the whisper web site?

@MacaGovani
Copy link

sample of 3 messages (the bad xml) all the personal info has been faked to non-real persons & phone numbs

@MacaGovani
Copy link

No description provided.

@McLoo
Copy link
Contributor

McLoo commented Dec 15, 2014

@MacaGovani thanks for your reply.
I fear you need to repost one of those XMLs between 3 back ticks before and three after the XML:

```
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
 ... YOUR XML INSIDE HERE
```

And please also post the text of the such a message, not the only the bad XML file. Or maybe a screenshot of the text.

And by the way: what version of TextSecure are you using? All attributes in the XML file should be in double quotes, not single quotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

7 participants