Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string #89624

Closed
anmyachev mannequin opened this issue Oct 13, 2021 · 7 comments
Assignees
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@anmyachev
Copy link
Mannequin

anmyachev mannequin commented Oct 13, 2021

BPO 45461
Nosy @vstinner, @ezio-melotti, @serhiy-storchaka, @miss-islington
PRs
  • bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec #28939
  • [3.10] bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939) #28943
  • [3.9] bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939) #28945
  • Files
  • test.py: test.py - reproducer
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2021-10-14.17:51:53.099>
    created_at = <Date 2021-10-13.14:31:37.358>
    labels = ['3.10', 'type-bug', '3.9', 'expert-unicode', '3.11']
    title = "UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \\ at end of string"
    updated_at = <Date 2021-10-14.17:51:53.099>
    user = 'https://bugs.python.org/anmyachev'

    bugs.python.org fields:

    activity = <Date 2021-10-14.17:51:53.099>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2021-10-14.17:51:53.099>
    closer = 'serhiy.storchaka'
    components = ['Unicode']
    creation = <Date 2021-10-13.14:31:37.358>
    creator = 'anmyachev'
    dependencies = []
    files = ['50354']
    hgrepos = []
    issue_num = 45461
    keywords = ['patch']
    message_count = 7.0
    messages = ['403837', '403838', '403840', '403848', '403892', '403919', '403920']
    nosy_count = 6.0
    nosy_names = ['vstinner', 'ezio.melotti', 'mrabarnett', 'serhiy.storchaka', 'miss-islington', 'anmyachev']
    pr_nums = ['28939', '28943', '28945']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue45461'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @anmyachev
    Copy link
    Mannequin Author

    anmyachev mannequin commented Oct 13, 2021

    Expected behavior - if read() function works correctly, then readline() should also works.

    Reproducer in file - just run: python test.py.

    Traceback (most recent call last):
      File "test.py", line 11, in <module>
        f.readline()
      File "C:\Users\amyachev\Miniconda3\envs\modin\lib\encodings\unicode_escape.py", line 26, in decode
        return codecs.unicode_escape_decode(input, self.errors)[0]
    UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

    @anmyachev anmyachev mannequin added 3.8 (EOL) end of life topic-unicode type-bug An unexpected behavior, bug, or error labels Oct 13, 2021
    @vstinner
    Copy link
    Member

    Can you please try write a simpler (shorter) reproducer?

    @anmyachev
    Copy link
    Mannequin Author

    anmyachev mannequin commented Oct 13, 2021

    Hello!

    I can reduce it a little.
    The buffer shoudln't be decreased, as it seems there is a some kind relation with the buffer size for IO operations.

    buffer = b'col1,col2,col3,col4,col5,col6\\r\\n0,2000-01-01,0,00:00:00,DuBFsyerJU,1809.3924826424557\\r\\n10,2000-01-01,10,01:00:00,AlwGHbVPpB,2853.2392617952996\\r\\n20,2000-01-01,20,02:00:00,TEkGgsYXYz,9933.278931158615\\r\\n30,2000-01-01,30,03:00:00,tfvnynVSfp,8574.917426248916\\r\\n40,2000-01-01,40,04:00:00,YOGjhztMWe,3768.71871233428\\r\\n50,2000-01-01,50,05:00:00,vkTOJSeQmU,6330.252072351792\\r\\n60,2000-01-01,60,06:00:00,LeolDfaGyv,5052.618993456892\\r\\n70,2000-01-01,70,07:00:00,OcyrbYVtyr,4287.371622852719\\r\\n80,2000-01-01,80,08:00:00,VUwDPNhcFV,3589.697826814614\\r\\n90,2000-01-01,90,09:00:00,KOadtzcNyK,4794.158259020925\\r\\n100,2000-01-01,100,10:00:00,rdSOjXJBWC,8826.736894397129\\r\\n110,2000-01-01,110,11:00:00,qzwVBOklhk,8086.105782454443\\r\\n120,2000-01-01,120,12:00:00,UTRlqVfKoD,1012.5061461339624\\r\\n130,2000-01-01,130,13:00:00,wKqEkRhkfw,2511.3137510933934\\r\\n140,2000-01-01,140,14:00:00,LxklWJbgxo,406.7116346419042\\r\\n150,2000-01-01,150,15:00:00,SxmZkdUgHv,8424.978062284761\\r\\n160,2000-01-01,160,16:00:00,nEvzypASGb,9890.252156059063\\r\\n170,2000-01-01,170,17:00:00,xiFkkjoDPB,2728.8359201479675\\r\\n180,2000-01-01,180,18:00:00,boMmgpBXgL,4231.680208002166\\r\\n190,2000-01-01,190,19:00:00,dXLJXWiXZI,7757.44902751916\\r\\n200,2000-01-01,200,20:00:00,PBdjwKoCMD,4915.090357003991\\r\\n210,2000-01-01,210,21:00:00,zGWLALpmoA,359.5243650158153\\r\\n220,2000-01-01,220,22:00:00,CfpZJoOqGZ,704.7990862762942\\r\\n230,2000-01-01,230,23:00:00,DrkxpLhpEN,520.3290677592321\\r\\n240,2000-01-02,240,00:00:00,TDKEBbZAzQ,5218.671660857721\\r\\n250,2000-01-02,250,01:00:00,gULwzvNeWO,4218.66872701774\\r\\n260,2000-01-02,260,02:00:00,ogSyzHWmNY,9026.657391329585\\r\\n270,2000-01-02,270,03:00:00,NetmmthtzN,2027.8312539582244\\r\\n280,2000-01-02,280,04:00:00,PoYiHipTzR,7667.627476518046\\r\\n290,2000-01-02,290,05:00:00,MjHIRGmsoq,4144.001792539834\\r\\n300,2000-01-02,300,06:00:00,qESRSNnNnO,5348.024681284471\\r\\n310,2000-01-02,310,07:00:00,sSIjcXWhLC,3622.4673907599413\\r\\n320,2000-01-02,320,08:00:00,IvjrlljbeB,7500.419388155823\\r\\n330,2000-01-02,330,09:00:00,aVWVRXZjZy,3686.5972529264213\\r\\n340,2000-01-02,340,10:00:00,QKeTjcNlCG,1228.9751449454411\\r\\n350,2000-01-02,350,11:00:00,phEdHCVsbe,4254.15983968718\\r\\n360,2000-01-02,360,12:00:00,ursHJjQxRK,6099.131673115221\\r\\n370,2000-01-02,370,13:00:00,JvjcRlYcYG,1503.3586866746164\\r\\n380,2000-01-02,380,14:00:00,gzCyqHPRRb,7816.898213939008\\r\\n390,2000-01-02,390,15:00:00,lQZmobRwzt,8295.113759829599\\r\\n400,2000-01-02,400,16:00:00,qspiYGfTou,1987.8215069414816\\r\\n410,2000-01-02,410,17:00:00,mcqWMMzomf,15.878728570531964\\r\\n420,2000-01-02,420,18:00:00,fiPsxulpGU,5380.485947841902\\r\\n430,2000-01-02,430,19:00:00,gTAyTkpeez,4720.7159908343565\\r\\n440,2000-01-02,440,20:00:00,hzFbhAPvFX,946.5797295044975\\r\\n450,2000-01-02,450,21:00:00,NYNcYxsyVl,7333.850198973723\\r\\n460,2000-01-02,460,22:00:00,wvgMmIxLzo,7399.341315026157\\r\\n470,2000-01-02,470,23:00:00,bZoyzAGgEC,5464.053510955946\\r\\n480,2000-01-03,480,00:00:00,jZNaceUYyr,1390.8829937709977\\r\\n490,2000-01-03,490,01:00:00,sbfLgcCpru,9626.900131786555\\r\\n500,2000-01-03,500,02:00:00,MHpAkHfnmV,9406.471079089133\\r\\n510,2000-01-03,510,03:00:00,ENdFBGtRCq,3740.8773019724517\\r\\n520,2000-01-03,520,04:00:00,FzqXhMLHLY,4270.3585910905\\r\\n530,2000-01-03,530,05:00:00,wWinjEGhAj,8548.152649813675\\r\\n540,2000-01-03,540,06:00:00,LcxAImCvxt,4097.693176523874\\r\\n550,2000-01-03,550,07:00:00,sDhzGBYKpt,1673.7466277500146\\r\\n560,2000-01-03,560,08:00:00,jhagjcZhGU,4103.702089490347\\r\\n570,2000-01-03,570,09:00:00,ZIkRwPWyWP,9368.662605679918\\r\\n580,2000-01-03,580,10:00:00,uphgoCQwZY,3321.0096306747137\\r\\n590,2000-01-03,590,11:00:00,jEKaqqScLF,8442.084614664149\\r\\n600,2000-01-03,600,12:00:00,kSIJFBHVnL,4065.19226287942\\r\\n610,2000-01-03,610,13:00:00,YRhoANskYn,5089.668482943252\\r\\n620,2000-01-03,620,14:00:00,SnlwCSdkWf,5738.46737129545\\r\\n630,2000-01-03,630,15:00:00,ANfpLOiJTV,393.77545256928823\\r\\n640,2000-01-03,640,16:00:00,DUxigzNtLz,6798.725575133883\\r\\n650,2000-01-03,650,17:00:00,jaJECwmWTY,5178.597327486391\\r\\n660,2000-01-03,660,18:00:00,tzrWZLSELo,7467.995039288831\\r\\n670,2000-01-03,670,19:00:00,rbUWLCKjeV,4013.698847016407\\r\\n680,2000-01-03,680,20:00:00,JKFAZgEkja,1538.6412971598695\\r\\n690,2000-01-03,690,21:00:00,uEomQhtneK,2849.6558284053976\\r\\n700,2000-01-03,700,22:00:00,VNqwqzfgXT,6756.852702484582\\r\\n710,2000-01-03,710,23:00:00,YzYqAlWMKn,9250.2543956494\\r\\n720,2000-01-04,720,00:00:00,VBrvxVqNpT,7430.930594705144\\r\\n730,2000-01-04,730,01:00:00,KxgdYwiVtl,1190.2548337790097\\r\\n740,2000-01-04,740,02:00:00,oPUENybUiS,247.4663426770396\\r\\n750,2000-01-04,750,03:00:00,bgpLfCsNrU,6472.8593061097\\r\\n760,2000-01-04,760,04:00:00,xmRUnIzNOL,5791.031151521782\\r\\n770,2000-01-04,770,05:00:00,SsYMDEINvO,347.35344936110636\\r\\n780,2000-01-04,780,06:00:00,XuorBLXsEt,9003.971751685769\\r\\n790,2000-01-04,790,07:00:00,jRYnFPYRKE,858.8836157464275\\r\\n800,2000-01-04,800,08:00:00,uRRXIdQDYH,4914.608250347407\\r\\n810,2000-01-04,810,09:00:00,nxkVSEnKXv,3586.0998633311424\\r\\n820,2000-01-04,820,10:00:00,BddLdFLDkg,9392.836980063128\\r\\n830,2000-01-04,830,11:00:00,MNuZvbMDqM,4075.512732895953\\r\\n840,2000-01-04,840,12:00:00,KfiIyqdZJq,4450.624248264806\\r\\n850,2000-01-04,850,13:00:00,ZNzdZZhipO,5155.329570863023\\r\\n860,2000-01-04,860,14:00:00,MmVEuWyJJt,7125.153628136557\\r\\n870,2000-01-04,870,15:00:00,QTVeqONJWF,7459.723393845693\\r\\n880,2000-01-04,880,16:00:00,sVHRlErfHm,5349.520468668593\\r\\n890,2000-01-04,890,17:00:00,OfcunHkqxU,2538.9594014567383\\r\\n900,2000-01-04,900,18:00:00,rXTISMpGvf,6136.26826553925\\r\\n910,2000-01-04,910,19:00:00,YYgIQPrYmN,2828.778965008356\\r\\n920,2000-01-04,920,20:00:00,acLWVYscRm,2135.4492617161204\\r\\n930,2000-01-04,930,21:00:00,ejuIuzrhoE,7853.20277523869\\r\\n940,2000-01-04,940,22:00:00,nEIyUKZvtl,9026.298438227512\\r\\n950,2000-01-04,950,23:00:00,fVrPrRMjgE,1108.9112508806\\r\\n960,2000-01-05,960,00:00:00,aQbeIHZfrq,6779.761579736982\\r\\n970,2000-01-05,970,01:00:00,NSYmULwYsy,4710.484556444787\\r\\n980,2000-01-05,980,02:00:00,OstJdNkpJM,6696.018116272272\\r\\n990,2000-01-05,990,03:00:00,zPdwVSfwsw,1019.0631993852805\\r\\n1000,2000-01-05,1000,04:00:00,PrPiNtxItj,4786.919229745998\\r\\n1010,2000-01-05,1010,05:00:00,iTrMpbwDkd,1082.2792701135043\\r\\n1020,2000-01-05,1020,06:00:00,VIOGBhjuvc,6712.260837571906\\r\\n1030,2000-01-05,1030,07:00:00,vKfivaIyHN,8660.527086155422\\r\\n1040,2000-01-05,1040,08:00:00,bAlxEIEfpN,1415.7747325826188\\r\\n1050,2000-01-05,1050,09:00:00,cJPGJmIKdc,9816.3246377919\\r\\n1060,2000-01-05,1060,10:00:00,AdSXaKQpQX,3536.32709953549\\r\\n1070,2000-01-05,1070,11:00:00,PHntAagAlw,7431.850668273714\\r\\n1080,2000-01-05,1080,12:00:00,ZtQrFBobvY,4224.027690860892\\r\\n1090,2000-01-05,1090,13:00:00,ZuPnbhaSOU,3484.8530656320654\\r\\n1100,2000-01-05,1100,14:00:00,qOSVmejqdo,6847.384220484392\\r\\n1110,2000-01-05,1110,15:00:00,kwckywqRbb,5867.829131220223\\r\\n1120,2000-01-05,1120,16:00:00,JLrzzbUfDi,6991.180870142121\\r\\n1130,2000-01-05,1130,17:00:00,qPuDjhipNE,2544.115558392327\\r\\n1140,2000-01-05,1140,18:00:00,nTuOipVPUZ,3521.350549002792\\r\\n1150,2000-01-05,1150,19:00:00,FxTDpmsUYC,5796.837844528479\\r\\n1160,2000-01-05,1160,20:00:00,IilnnODeoz,9981.446352555968\\r\\n1170,2000-01-05,1170,21:00:00,lJpBtcVSww,8659.609927822496\\r\\n1180,2000-01-05,1180,22:00:00,uefmaifDgk,164.5549179029382\\r\\n1190,2000-01-05,1190,23:00:00,AQsKnkJxOV,455.31829622753816\\r\\n1200,2000-01-06,1200,00:00:00,IUcDyPSHIE,5727.976331105652\\r\\n1210,2000-01-06,1210,01:00:00,nrEdNiWGdi,2015.5167059418156\\r\\n1220,2000-01-06,1220,02:00:00,EflmCojQzg,9514.004760633412\\r\\n1230,2000-01-06,1230,03:00:00,LsAIvtooWr,7898.8225145572\\r\\n1240,2000-01-06,1240,04:00:00,yiDOUysGHw,4219.262059231663\\r\\n1250,2000-01-06,1250,05:00:00,idWAZATxwy,3043.2304072778616\\r\\n1260,2000-01-06,1260,06:00:00,sBedlknKzY,3840.820372936372\\r\\n1270,2000-01-06,1270,07:00:00,ReEmhVRAjb,6966.434389542963\\r\\n1280,2000-01-06,1280,08:00:00,XnFrfzMBKt,6041.8596064524045\\r\\n1290,2000-01-06,1290,09:00:00,MaMMHEWEIf,2569.2675325271707\\r\\n1300,2000-01-06,1300,10:00:00,OUpokSyVfO,7387.813510302333\\r\\n1310,2000-01-06,1310,11:00:00,VgCigxOcbF,7695.008235452545\\r\\n1320,2000-01-06,1320,12:00:00,ouRNYgSzXq,3293.250454887212\\r\\n1330,2000-01-06,1330,13:00:00,iQczJExipS,1892.9945453269115\\r\\n1340,2000-01-06,1340,14:00:00,vVbLlDWFCr,7105.276586964716\\r\\n1350,'
    
    with open("bug_csv.csv", "wb") as f:
        f.write(buffer)
    
    with open("bug_csv.csv", encoding="unicode_escape", newline="") as f:
        f.readline()

    @mrabarnett
    Copy link
    Mannequin

    mrabarnett mannequin commented Oct 13, 2021

    It can be shortened to this:

    buffer = b"a" * 8191 + b"\\r\\n"
    
    with open("bug_csv.csv", "wb") as f:
        f.write(buffer)
    
    with open("bug_csv.csv", encoding="unicode_escape", newline="") as f:
        f.readline()

    To me it looks like it's reading in blocks of 8K and then decoding them, but it isn't correctly handling an escape sequence that happens to cross a block boundary.

    @serhiy-storchaka serhiy-storchaka added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes and removed 3.8 (EOL) end of life labels Oct 13, 2021
    @serhiy-storchaka serhiy-storchaka self-assigned this Oct 13, 2021
    @serhiy-storchaka serhiy-storchaka added the 3.9 only security fixes label Oct 13, 2021
    @serhiy-storchaka
    Copy link
    Member

    New changeset c96d154 by Serhiy Storchaka in branch 'main':
    bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939)
    c96d154

    @serhiy-storchaka
    Copy link
    Member

    New changeset 0bff4cc by Miss Islington (bot) in branch '3.10':
    [3.10] bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939) (GH-28943)
    0bff4cc

    @serhiy-storchaka
    Copy link
    Member

    New changeset 7c722e3 by Serhiy Storchaka in branch '3.9':
    [3.9] bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" codec (GH-28939) (GH-28945)
    7c722e3

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants