Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch with comments and proposed changes #5

Closed
ylavi opened this issue May 19, 2012 · 3 comments
Closed

Patch with comments and proposed changes #5

ylavi opened this issue May 19, 2012 · 3 comments

Comments

@ylavi
Copy link

ylavi commented May 19, 2012

I have been testing this parser library, intending to use it in a project, and have noted some opportunities for changes. They take the following forms:

  1. Improvements to support for v2.1 format (while the library targets 3.0 some support exists for 2.1 so it makes sense to improve it)
  2. An addition to support non-standard parameters which may be found "in the wild"
  3. Comments on possible weaknesses in the current code

My changes are as follows, in patch form:

@@ -50,11 +50,13 @@

        private static $Spec_ElementTypes = array(
            'email' => array('internet', 'x400', 'pref'),
            'adr' => array('dom', 'intl', 'postal', 'parcel', 'home', 'work', 'pref'),
            'label' => array('dom', 'intl', 'postal', 'parcel', 'home', 'work', 'pref'),
-           'tel' => array('home', 'msg', 'work', 'pref', 'voice', 'fax', 'cell', 'video', 'pager', 'bbs', 'modem', 'car', 'isdn', 'pcs')
+           'tel' => array('home', 'msg', 'work', 'pref', 'voice', 'fax', 'cell', 'video', 'pager', 'bbs', 'modem', 'car', 'isdn', 'pcs'),
+           'url' => array('work', 'home'), // not per RFC but may be present in practice
+           'impp' => array('personal','business','home','work','mobile','pref') // http://tools.ietf.org/html/rfc4770
        );

        private static $Spec_FileElements = array('photo', 'logo', 'sound');

        /**
@@ -138,16 +140,23 @@
                    $this -> Data[] = new vCard(false, $SinglevCardRawData);
                }
            }
            else
            {
+               // Protect the BASE64 final = sign (detected by the line beginning with whitespace), otherwise the next replace will get rid of it
+               $this -> RawData = preg_replace('/(\n\s.+)=(\n)/', '$1-base64=-$2', $this -> RawData);
+
                // Joining multiple lines that are split with a hard wrap and indicated by an equals sign at the end of line
+               // i.e. quoted-printable-encoded values in v2.1 vCards
                $this -> RawData = str_replace("=\n", '', $this -> RawData);

                // Joining multiple lines that are split with a soft wrap (space or tab on the beginning of the next line
                $this -> RawData = str_replace(array("\n ", "\n\t"), '-wrap-', $this -> RawData);

+               // Restore the BASE64 =
+               $this -> RawData = str_replace("-base64=-\n", "=\n", $this -> RawData);
+
                $Lines = explode("\n", $this -> RawData);

                foreach ($Lines as $Line)
                {
                    // Lines without colons are skipped because, most likely, they contain no data.
@@ -160,10 +169,11 @@
                    //  value is just the value
                    list($Key, $Value) = explode(':', $Line, 2);

                    // Key is transformed to lowercase because, even though the element and parameter names are written in uppercase,
                    //  it is quite possible that they will be in lower- or mixed case.
+                   // The key is trimmed to allow for non-significant WSP characters as allowed by v2.1
                    $Key = strtolower(trim(self::Unescape($Key)));

                    // These two lines can be skipped as they aren't necessary at all.
                    if ($Key == 'begin' || $Key == 'end')
                    {
@@ -183,10 +193,13 @@
                    else
                    {
                        $Value = str_replace('-wrap-', '', $Value);
                    }

+                   // The value is trimmed to allow for non-significant WSP characters as allowed by v2.1
+                   // (This might remove white space which a v3.0 parser is meant to retain)
+                   // WARNING! unescaping ; characters at this point causes the ParseStructuredValue later to parse the values wrongly!
                    $Value = trim(self::Unescape($Value));
                    $Type = array();

                    // Here additional parameters are parsed
                    $KeyParts = explode(';', $Key);
@@ -201,20 +214,22 @@
                        {
                            switch ($ParamKey)
                            {
                                case 'encoding':
                                    $Encoding = $ParamValue;
-                                   if ($ParamValue == 'b')
+                                   // Allow for both v3.0 B and v2.1 BASE64 encoding parameter values
+                                   //if ($ParamValue == 'b')
+                                   if (in_array($ParamValue, array('b', 'base64')))
                                    {
                                        //$Value = base64_decode($Value);
                                    }
-                                   elseif ($ParamValue == 'quoted-printable')
+                                   elseif ($ParamValue == 'quoted-printable') // v2.1
                                    {
                                        $Value = quoted_printable_decode($Value);
                                    }
                                    break;
-                               case 'charset':
+                               case 'charset': // v2.1
                                    if ($ParamValue != 'utf-8' && $ParamValue != 'utf8')
                                    {
                                        $Value = mb_convert_encoding($Value, 'UTF-8', $ParamValue);
                                    }
                                    break;
@@ -336,11 +351,13 @@
            }

            if (is_writable($TargetPath) || (!file_exists($TargetPath) && is_writable(dirname($TargetPath))))
            {
                $RawContent = $this -> Data[$Key][$Index]['Value'];
-               if (isset($this -> Data[$Key][$Index]['Encoding']) && $this -> Data[$Key][$Index]['Encoding'] == 'b')
+               // Allow for both v3.0 B and v2.1 BASE64 encoding parameter values
+               //if (isset($this -> Data[$Key][$Index]['Encoding']) && $this -> Data[$Key][$Index]['Encoding'] == 'b')
+               if (isset($this -> Data[$Key][$Index]['Encoding']) && (in_array($this -> Data[$Key][$Index]['Encoding'], array('b', 'base64'))) )
                {
                    $RawContent = base64_decode($RawContent);
                }
                $Status = file_put_contents($TargetPath, $RawContent);
                return (bool)$Status;
@@ -493,10 +510,12 @@
         * @return string Resulting text.
         */
        private static function Unescape($Text)
        {
            return str_replace(array('\:', '\;', '\,', "\n"), array(':', ';', ',', ''), $Text);
+           // The line above removes encoded newlines ("\n") rather than converting them back to CRLF character sequences!
+           // "\N" should be treated in the same way as "\n"
        }

        /**
         * Separates the various parts of a structured value according to the spec.
         *
@@ -582,11 +601,13 @@
                else
                {
                    switch ($Parameter[0])
                    {
                        case 'encoding':
-                           if (in_array($Parameter[1], array('quoted-printable', 'b')))
+                           // Allow for both v3.0 B and v2.1 BASE64 encoding parameter values
+                           //if (in_array($Parameter[1], array('quoted-printable', 'b')))
+                           if (in_array($Parameter[1], array('quoted-printable', 'b', 'base64')))
                            {
                                $Result['encoding'] = $Parameter[1];
                            }
                            break;
                        case 'charset':
@pilsetnieks
Copy link
Member

I've incorporated most of the changes. Also, if possible, could you provide some examples of the problematic vCards you've encountered?

@ylavi
Copy link
Author

ylavi commented May 20, 2012

The only thing I actually encountered (which is why I addressed it) was TYPE in URL lines. Those were in v2.1 vCards exported by my old phone.

The issue of unescaping of semi-colons issue was just something I noticed about the code and demonstrated by adding semi-colons to a test address.

Were there issues you didn't agree with but examples might persuade you?

@pilsetnieks
Copy link
Member

Oh, no, I didn't mean that I don't agree to any issues, it's just that I asked for examples so that testing would be easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants