Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient derived instance #29

Open
vagarenko opened this issue Jan 27, 2017 · 3 comments
Open

Inefficient derived instance #29

vagarenko opened this issue Jan 27, 2017 · 3 comments

Comments

@vagarenko
Copy link

Please see http://stackoverflow.com/questions/41884350/performance-difference-with-unpacking?noredirect=1#comment70951584_41884350

I have 2 datatypes for 4x4 matrix of Float:

  1. Matrix made of four four-element vectors of Float
  2. Matrix of 16 elements of Float

I suppose they have to be identical, but derived core code for NFData instance turns out to be different:

  1. 4 vectors of 4 elems:
-- RHS size: {terms: 5, types: 18, coercions: 0}
$fNFDataMatrix4x4f_$s$dmrnf :: Matrix4x4f -> ()
$fNFDataMatrix4x4f_$s$dmrnf =
  \ (eta_X8zo :: Matrix4x4f) ->
    case eta_X8zo
    of _
    { Matrix4x4f dt_d8x5 dt1_d8x6 dt2_d8x7 dt3_d8x8 dt4_d8x9 dt5_d8xa
                 dt6_d8xb dt7_d8xc dt8_d8xd dt9_d8xe dt10_d8xf dt11_d8xg dt12_d8xh
                 dt13_d8xi dt14_d8xj dt15_d8xk ->
    ()
    }
  1. Matrix of 16 elems:
-- RHS size: {terms: 70, types: 954, coercions: 0}
$fNFDataMatrix4x4f_$s$dmrnf :: Matrix4x4f -> ()
$fNFDataMatrix4x4f_$s$dmrnf =
  \ (eta_a8Bf :: Matrix4x4f) ->
    case eta_a8Bf
    of _
    { Matrix4x4f ww1_s8VG ww2_s8VH ww3_s8VI ww4_s8VJ ww5_s8VK ww6_s8VL
                 ww7_s8VM ww8_s8VN ww9_s8VO ww10_s8VP ww11_s8VQ ww12_s8VR ww13_s8VS
                 ww14_s8VT ww15_s8VU ww16_s8VV ->
    case $w$cfrom
           ww1_s8VG
           ww2_s8VH
           ww3_s8VI
           ww4_s8VJ
           ww5_s8VK
           ww6_s8VL
           ww7_s8VM
           ww8_s8VN
           ww9_s8VO
           ww10_s8VP
           ww11_s8VQ
           ww12_s8VR
           ww13_s8VS
           ww14_s8VT
           ww15_s8VU
           ww16_s8VV
    of _ { (# ww18_s8Yp, ww19_s8Yq #) ->
    case ww18_s8Yp of _ { :*: ww21_s8W3 ww22_s8Wj ->
    case ww21_s8W3 of _ { :*: ww24_s8W6 ww25_s8Wc ->
    case ww24_s8W6 of _ { :*: ww27_s8W9 ww28_s8Wa ->
    case ww25_s8Wc of _ { :*: ww30_s8Wf ww31_s8Wg ->
    case ww22_s8Wj of _ { :*: ww33_s8Wm ww34_s8Ws ->
    case ww33_s8Wm of _ { :*: ww36_s8Wp ww37_s8Wq ->
    case ww34_s8Ws of _ { :*: ww39_s8Wv ww40_s8Ww ->
    case ww27_s8W9 of _ { __DEFAULT ->
    case ww28_s8Wa of _ { __DEFAULT ->
    case ww30_s8Wf of _ { __DEFAULT ->
    case ww31_s8Wg of _ { __DEFAULT ->
    case ww36_s8Wp of _ { __DEFAULT ->
    case ww37_s8Wq of _ { __DEFAULT ->
    case ww39_s8Wv of _ { __DEFAULT ->
    case ww40_s8Ww of _ { __DEFAULT -> $fNFDataMatrix4x4f3 ww19_s8Yq }
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }
@RyanGlScott
Copy link
Member

Indeed, the GHC.Generics-based default is not well suited for strict fields at the moment. There have been some ideas floating around to address this (see here), but it would probably require some more type-level hackery than deepseq currently implements.

Until then, you should probably define your NFData instances for WHNF=NF types like so:

instance NFData Matrix4x4f where
  rnf !_ = ()

@arybczak
Copy link

arybczak commented Mar 30, 2020

This isn't a problem of strict fields, it's a problem of generics not optimizing away for larger data types. See https://gitlab.haskell.org/ghc/ghc/-/merge_requests/2965 for relevant discussion.

Basically, with the above patch derived generic NFData instance for a type:

data Z = Z1
  { x00 :: Int
  , x01 :: Int
  , x02 :: Int
  , x03 :: Int
  , x04 :: Int
  , x05 :: Int
  , x06 :: Int
  , x07 :: Int
  , x08 :: Int
  , x09 :: Int
  , x10 :: Int
  , x11 :: Int
  , x12 :: Int
  , x13 :: Int
  , x14 :: Int
  , x15 :: Int
  , x16 :: Int
  , x17 :: Int
  , x18 :: Int
  , x19 :: Int
  }

ends up as the following core:

$fNFDataZ_$crnf
  = \ w_sqVt ->
      case w_sqVt of
      { Z1 ww1_sqVw ww2_sqVB ww3_sqVG ww4_sqVL ww5_sqVQ ww6_sqVV ww7_sqW0
           ww8_sqW5 ww9_sqWa ww10_sqWf ww11_sqWk ww12_sqWp ww13_sqWu ww14_sqWz
           ww15_sqWE ww16_sqWJ ww17_sqWO ww18_sqWT ww19_sqWY ww20_sqX3 ->
      case ww1_sqVw of { I# ww22_sqVz ->
      case ww2_sqVB of { I# ww24_sqVE ->
      case ww3_sqVG of { I# ww26_sqVJ ->
      case ww4_sqVL of { I# ww28_sqVO ->
      case ww5_sqVQ of { I# ww30_sqVT ->
      case ww6_sqVV of { I# ww32_sqVY ->
      case ww7_sqW0 of { I# ww34_sqW3 ->
      case ww8_sqW5 of { I# ww36_sqW8 ->
      case ww9_sqWa of { I# ww38_sqWd ->
      case ww10_sqWf of { I# ww40_sqWi ->
      case ww11_sqWk of { I# ww42_sqWn ->
      case ww12_sqWp of { I# ww44_sqWs ->
      case ww13_sqWu of { I# ww46_sqWx ->
      case ww14_sqWz of { I# ww48_sqWC ->
      case ww15_sqWE of { I# ww50_sqWH ->
      case ww16_sqWJ of { I# ww52_sqWM ->
      case ww17_sqWO of { I# ww54_sqWR ->
      case ww18_sqWT of { I# ww56_sqWW ->
      case ww19_sqWY of { I# ww58_sqX1 ->
      case ww20_sqX3 of { I# ww60_sqX6 -> () }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }

and for the same type with strict fields:

data Z = Z1
  { x00 :: !Int
  , x01 :: !Int
  , x02 :: !Int
  , x03 :: !Int
  , x04 :: !Int
  , x05 :: !Int
  , x06 :: !Int
  , x07 :: !Int
  , x08 :: !Int
  , x09 :: !Int
  , x10 :: !Int
  , x11 :: !Int
  , x12 :: !Int
  , x13 :: !Int
  , x14 :: !Int
  , x15 :: !Int
  , x16 :: !Int
  , x17 :: !Int
  , x18 :: !Int
  , x19 :: !Int
  }

core ends up like this:

$fNFDataZ_$crnf
  = \ x2_Xh8 ->
      case x2_Xh8 of
      { Z1 dt_dlfu dt1_dlfv dt2_dlfw dt3_dlfx dt4_dlfy dt5_dlfz dt6_dlfA
           dt7_dlfB dt8_dlfC dt9_dlfD dt10_dlfE dt11_dlfF dt12_dlfG dt13_dlfH
           dt14_dlfI dt15_dlfJ dt16_dlfK dt17_dlfL dt18_dlfM dt19_dlfN ->
      ()
      }

However, if GHC can't optimize away generics (without the above patch), then core ends up being:

$fNFDataZ_$crnf
  = \ x2_akJI ->
      case ($fGenericZ1 x2_akJI) `cast` <Co:1074> of
      { :*: x3_alRi y_alRj ->
      case x3_alRi of { :*: x4_Xmkg y1_Xmki ->
      case x4_Xmkg of { :*: x5_Xmkn y2_Xmkp ->
      case x5_Xmkn of { :*: x6_Xmku y3_Xmkw ->
      case x6_Xmku `cast` <Co:28> of { I# ipv_alQI ->
      case y3_Xmkw `cast` <Co:28> of { I# ipv1_Xmk2 ->
      case y2_Xmkp of { :*: x7_XmkC y4_XmkE ->
      case x7_XmkC `cast` <Co:28> of { I# ipv2_Xmk8 ->
      case y4_XmkE of { :*: x8_XmkL y5_XmkN ->
      case x8_XmkL `cast` <Co:28> of { I# ipv3_Xmkh ->
      case y5_XmkN `cast` <Co:28> of { I# ipv4_Xmkj ->
      case y1_Xmki of { :*: x9_XmkK y6_XmkM ->
      case x9_XmkK of { :*: x10_XmkR y7_XmkT ->
      case x10_XmkR `cast` <Co:28> of { I# ipv5_XmO2 ->
      case y7_XmkT `cast` <Co:28> of { I# ipv6_XmO6 ->
      case y6_XmkM of { :*: x11_XmkZ y8_Xml1 ->
      case x11_XmkZ `cast` <Co:28> of { I# ipv7_Xmkv ->
      case y8_Xml1 of { :*: x12_Xml8 y9_Xmla ->
      case x12_Xml8 `cast` <Co:28> of { I# ipv8_XmOA ->
      case y9_Xmla `cast` <Co:28> of { I# ipv9_XmkG ->
      case y_alRj of { :*: x13_Xml2 y10_Xml4 ->
      case x13_Xml2 of { :*: x14_Xml9 y11_Xmlb ->
      case x14_Xml9 of { :*: x15_Xmlg y12_Xmli ->
      case x15_Xmlg `cast` <Co:28> of { I# ipv10_XmOQ ->
      case y12_Xmli `cast` <Co:28> of { I# ipv11_XmOU ->
      case y11_Xmlb of { :*: x16_Xmlo y13_Xmlq ->
      case x16_Xmlo `cast` <Co:28> of { I# ipv12_XmkU ->
      case y13_Xmlq of { :*: x17_Xmlx y14_Xmlz ->
      case x17_Xmlx `cast` <Co:28> of { I# ipv13_Xml3 ->
      case y14_Xmlz `cast` <Co:28> of { I# ipv14_XmPs ->
      case y10_Xml4 of { :*: x18_Xmlw y15_Xmly ->
      case x18_Xmlw of { :*: x19_XmlD y16_XmlF ->
      case x19_XmlD `cast` <Co:28> of { I# ipv15_XmPA ->
      case y16_XmlF `cast` <Co:28> of { I# ipv16_XmPE ->
      case y15_Xmly of { :*: x20_XmlL y17_XmlN ->
      case x20_XmlL `cast` <Co:28> of { I# ipv17_Xmlh ->
      case y17_XmlN of { :*: x21_XmlU y18_XmlW ->
      case x21_XmlU `cast` <Co:28> of { I# ipv18_XmQ8 ->
      case y18_XmlW `cast` <Co:28> of { I# ipv19_Xmls -> () }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }
      }

@arybczak
Copy link

arybczak commented Dec 31, 2020

FYI this will be fixed in GHC 9.2 (at least for types with one constructor, for more restrictions apply).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants