-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another problem with rfc822 #176
Comments
Hello, What is the result expected for this field ? Your are right the problem is about According to this https://www.w3.org/Protocols/rfc822/
It's not compliant with RFC822.
|
I understand but this is real life example. This is my customer's email. What can I do? |
I don't know what I can do for that so I don't know what you can do ... What is the result expected for you ? |
Right now it's throwing error. And that kind of wrong address cen be not only in "to" field, but also in "copy" and so on. You have method to fetch addresses:
Can't You make after assiging Somthing like:
|
Yes it's possible but I'm not good enough in regex for that. If you have this regex I can include it. |
Ok but I on the other hand don't know what are possible inputs. You posted some here: |
BTW for regexp I reccomend: https://regex101.com/ |
You can find a lot of example here: https://www.w3.org/Protocols/rfc822/#z10 Thanks for the links |
I'm too weak with reg exp also :) I posted question on stackoverflow https://stackoverflow.com/questions/48563429/regular-expression-for-rfc822-standard If You want to add something feel free. |
You 've got first anwser there - should work |
Great but it doesn't work, I did this: public function getAddresses($name)
{
$value = $this->getRawHeader($name);
$value = (is_array($value)) ? $value[0] : $value;
var_dump($value);
$value = preg_replace("/\".*?\"(*SKIP)(*FAIL)|(\w+\s[<>@]\s\w+)/", "\"$1\"", $value);
var_dump($value);
$addresses = mailparse_rfc822_parse_addresses($value);
foreach ($addresses as $i => $item) {
$addresses[$i]['display'] = $this->decodeHeader($item['display']);
}
return $addresses;
} First output: Second output: There is no difference, it's working for him because he did the sanitize after an decoding work but we need to do it before mailparse_rfc822_parse_addresses. |
This string |
setText is only to parse the metadata of the email. if you are using public function testGetAddressesWithSpecialChars()
{
$file = __DIR__ . '/mails/m0124';
$Parser = new Parser();
$Parser->setText(file_get_contents($file));
$to = $Parser->getHeader('to');
$this->assertEquals('aFakeowska >> Agnieszka fake-Fakeowska Przeglądy budynków <aFakeowska@fake.com.pl>', $to);
} if you are using public function testGetAddressesWithSpecialChars()
{
$file = __DIR__ . '/mails/m0124';
$Parser = new Parser();
$Parser->setText(file_get_contents($file));
$to = $Parser->getAddresses('to');
$this->assertEquals('aFakeowska >> Agnieszka fake-Fakeowska Przeglądy budynków <aFakeowska@fake.com.pl>', $to);
} So to fix this issue I need to impact getAddresses but getAddresses take getRawHeader() that is returning So I need a regex for this string to fix the issue before to pass the value to mailparse_rfc822_parse_addresses(). public function getAddresses($name)
{
$value = $this->getRawHeader($name);
$value = (is_array($value)) ? $value[0] : $value;
$value = preg_replace("REGEX NEEDED", "\"$1\"", $value);
$addresses = mailparse_rfc822_parse_addresses($value);
foreach ($addresses as $i => $item) {
$addresses[$i]['display'] = $this->decodeHeader($item['display']);
}
return $addresses;
} |
It turned out that I'm using public function getAddresses($name)
{
$value = $this->getHeader($name);
return mailparse_rfc822_parse_addresses($value);
} which gets header and then decodes it - before EDIT: Seems to work: public function getAddresses($name)
{
$value = $this->getRawHeader($name);
$value = preg_replace('/".*?"(*SKIP)(*FAIL)|(.+[<>\/\\(),.;:\[\]@]+.+)(\s<)/', '"$1"$2', $value);
$value = (is_array($value)) ? $value[0] : $value;
$addresses = mailparse_rfc822_parse_addresses($value);
foreach ($addresses as $i => $item) {
$addresses[$i]['display'] = $this->decodeHeader($item['display']);
}
return $addresses;
} Edit2: /**
* @dataProvider daneParsera
*
* @param $value
*/
public function testParsera($value)
{
$value = preg_replace('/".*?"(*SKIP)(*FAIL)|(.+[<>\/\\(),.;:\[\]@]+.+)(\s<)/', '"$1"$2', $value);
$this->assertTrue(is_array(mailparse_rfc822_parse_addresses($value)));
}
public function daneParsera()
{
return [
['a(a <aFakeowska@fake.com.pl>'],
['a)a <aFakeowska@fake.com.pl>'],
['a/a <aFakeowska@fake.com.pl>'],
['a\a <aFakeowska@fake.com.pl>'],
['a,a <aFakeowska@fake.com.pl>'],
['a.a <aFakeowska@fake.com.pl>'],
['a;a <aFakeowska@fake.com.pl>'],
['a:a <aFakeowska@fake.com.pl>'],
['a[a <aFakeowska@fake.com.pl>'],
['a]a <aFakeowska@fake.com.pl>'],
['a@a <aFakeowska@fake.com.pl>'],
['a-a <aFakeowska@fake.com.pl>'],
['a_a <aFakeowska@fake.com.pl>'],
];
} |
It's working but I have one issue with: Input: Actual Output: Expected Output: |
I cannot deal with it:/ Maybe You can explode phrase with |
it's become very complex for something that doesn't respect the RFC. If I explode with I try this but it doesn't work: public function sanitizeAddresses($addresses)
{
$addresses = explode(">,", $addresses);
$addresses = array_map(function ($address) {
return preg_replace('/".*?"(*SKIP)(*FAIL)|(.+[<>\/\\(),.;:\[\]@]+.+)(\s<)/', '"$1"$2', $address);
}, $addresses);
return implode(">,", $addresses);
}
Maybe the good solution could be to catch the error and return the string like it is ? |
I hope I have good code. Comments in code. Hope it works. /**
* @dataProvider daneParsera
*
* @param $addresses
*/
public function testParsera($addresses)
{
//Reg Exp to explode string with `>, `. Before `>, ` has to be an email
$re = '/(.*<[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+>),\s/U';
//Adding `, ` to the end of string for non `>, ` strings
preg_match_all($re, $addresses . ', ', $matches, PREG_SET_ORDER, 0);
if ($matches) {
$parsed = [];
//parsing every address
foreach ($matches AS $k => $w) {
$address = $w[1];
$parsed[] = preg_replace('/".*?"(*SKIP)(*FAIL)|(.+[<>\/\\(),.;:\[\]@]+.+)(\s<)/', '"$1"$2', $address);
}
//implode result together
$addresses = implode(', ', $parsed);
}
$this->assertTrue(is_array(mailparse_rfc822_parse_addresses($addresses)));
}
public function daneParsera()
{
return [
['a(a <aFakeowska@fake.com.pl>'],
['a)a <aFakeowska@fake.com.pl>'],
['a/a <aFakeowska@fake.com.pl>'],
['a\a <aFakeowska@fake.com.pl>'],
['a,a <aFakeowska@fake.com.pl>'],
['a.a <aFakeowska@fake.com.pl>'],
['a;a <aFakeowska@fake.com.pl>'],
['a:a <aFakeowska@fake.com.pl>'],
['a[a <aFakeowska@fake.com.pl>'],
['a]a <aFakeowska@fake.com.pl>'],
['a@a <aFakeowska@fake.com.pl>'],
['a-a <aFakeowska@fake.com.pl>'],
['a_a <aFakeowska@fake.com.pl>'],
['a_a <aFakeowska@fake.com.pl>, a-a <aFakeowska@fake.com.pl>'],
['a_a <aFakeowska@fake.com.pl>, a-a <aFakeowska@fake.com.pl>'],
['a>a <aFakeowska@fake.com.pl>'],
['a>,@a <aFakeowska@fake.com.pl>'],
['"a>, @a" <aFakeowska@fake.com.pl>'],
['a,a <aFakeowska@fake.com.pl>, a<a <aFakeowska@fake.com.pl>, a@a <aFakeowska@fake.com.pl>'],
['=?UTF-8?Q?aFakeowska_>>_Agnieszka_fake-Fakeowska_Przegl=c4=85dy_?= =?UTF-8?B?YnVkeW5rw7N3?= <aFakeowska@fake.com.pl>'],
['Alfred > Neuman <Neuman@BBN-TENEXA>, Alfred Neuman <Neuman@BBN-TENEXA>, "Alfred > Neuman" <Neuman@BBN-TENEXA>, Alfred > Neuman <Neuman@BBN-TENEXA>'],
];
} |
Thanks for your contribution. I don't know how I will include this in my lib because I don't know all the impacts. One more issue:
Actual result is |
Idea with flag is very good. |
It isn't a good idea to include regex in the library and also to support non-RFC compliant structures. Rather, you should leave this to an outside library that converts emails to RFC compliance or helps with parsing. There is too many implementation specific's in MTAs and mail clients creating mime messages that supporting those is opening up too much complexity in one library. This can be seen in many regex based mime parsers, they have too many failures and there is no way to write tests for regex expressions in parsers covering all possibilities. Instead there should be hooks, events and/or middleware implemented so that parsing can pass through a external library or function before handling. Example:
This way the Email parsing is always going to have failures. Trying to handle every use case is not possible in one library. |
Also, you can ship middleware for parsing RFC compliant Mime messages that php or the mailparse extension cannot handle correctly. For example in the case of issue #168 |
@piernik The new regex works better but it's always failing:
@fijiwebdesign I agree most of you view point. I don't know yet how to do that but I find a way to manage all of this exception because it's the real life or at least let the dev be able to extends PhpMimeMailParser. |
@eXorus @fijiwebdesign sure it's good idea, maybe best. |
@piernik in this case it's a group of emails because you start by a label |
Don't get it - what is group of emails? I do not know RFC pattern at all. |
I just take the example here: https://www.w3.org/Protocols/rfc822/#z10
|
I've added the middleware to this PR: #180 For example to fix the email address you'd do:
Or create a full middleware implementation
|
Version with middleware is released, you can do your regex in a middleware if you want. |
I have other problem with rfc822:
To: =?UTF-8?Q?aFakeowska_>>_Agnieszka_fake-Fakeowska_Przegl=c4=85dy_?= =?UTF-8?B?YnVkeW5rw7N3?= <aFakeowska@fake.com.pl>
(it's a part of eml file).Guess problem is
>>
in name. It's a real life example.Is there any solution?
The text was updated successfully, but these errors were encountered: