Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceeds buffer capacity #23

Closed
aleczhanshi opened this issue Mar 31, 2020 · 10 comments
Closed

Exceeds buffer capacity #23

aleczhanshi opened this issue Mar 31, 2020 · 10 comments

Comments

@aleczhanshi
Copy link

Hi there,

As I'm playing with different configurations, I've run into ERROR: couldn't map level GlobalBuffer: mapped tile size 428201 exceeds buffer capacity 65536. I've been trying to look into the codebase to figure out what it happens, but it seems a bit hard to figure this out through the code.

Could you briefly explain (hopefully in math) how the mapped tile size, buffer capacity are computed from the problem shape (RSPQCKN) and arch specs (sizeKB, entries, word-bits, instances, etc.).

Here are my
problem shape = (R = 7; S = 7; P = 112; Q = 112; C = 3; K = 64; N = 1; Wstride = 2; Hstride = 2;)
factors = ("R1 S1 P112 Q1 C1 K1 N1")
and arch spec (sizeKB = 128; instances = 1; meshX = 1; word-bits = 16; block-size = 4; read_bandwidth = 16; write_bandwidth = 16;)

Thanks in advance!

@angshuman-parashar
Copy link
Collaborator

Factors are multiplicatively cumulative, so to determine the tile size at the Global buffer I'll need to know the factors at all levels inside of the Global buffer as well.

@aleczhanshi
Copy link
Author

aleczhanshi commented Mar 31, 2020

@angshuman-parashar Thanks. Below are the factors of the global buffer. Is that what we need to compute the tile size?

    {
      target = 4;
      type = "spatial";
      factors = "R1 S1 P1 Q8 C1 K2 N1";
      permutation = "QKRSPCN";
      split = 2;
    }, 
    {
      target = 4;
      type = "temporal";
      factors = "R1 S1 P112 Q1 C1 K1 N1";
      permutation = "PRSQCKN";
    }, 

@angshuman-parashar
Copy link
Collaborator

No that's not enough. As you can see, that's storage level #4. I need to know factors for levels 0, 1, 2, 3 as well - the product of all of those factors will give you the tile size at level 4. Perhaps that explains why your buffer is overflowing?

@aleczhanshi
Copy link
Author

@angshuman-parashar Thanks! What are the equations behind this? For example, is the tile size for level 0 the product of all factors (R, S, P, Q, C, K, N)? For upper levels, could you show me the equation to compute the tile size based on the lower levels and itself? I'm putting all the factors below. Thanks!

    {
      target = 0;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K16 N1";
      permutation = "KRSPQCN";
    }, 
    {
      target = 1;
      type = "temporal";
      factors = "R7 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 2;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 3;
      type = "spatial";
      factors = "R1 S7 P1 Q1 C1 K2 N1";
      permutation = "SKRPQCN";
      split = 0;
    }, 
    {
      target = 3;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 4;
      type = "spatial";
      factors = "R1 S1 P1 Q8 C1 K2 N1";
      permutation = "QKRSPCN";
      split = 2;
    }, 
    {
      target = 4;
      type = "temporal";
      factors = "R1 S1 P112 Q1 C1 K1 N1";
      permutation = "PRSQCKN";
    }, 

@angshuman-parashar
Copy link
Collaborator

angshuman-parashar commented Mar 31, 2020

First calculate each dimension as the product of all factors. E.g., multiplying over all levels (temporal + spatial) from 0 through 4, we get: R = 7, S=7, P=112, Q=8, C=1, K=64, N=1. This gives us the problem- or iteration-space tile at level 4. Next, project this problem-space into the data-spaces (i.e., tensors) to obtain the tile shapes for those spaces. E.g., weights = R*S*C*K = 3,136, outputs = N*K*Q*P = 57,344 and inputs = N*C*(S+(Q-1)*Hstride)*(R+(P-1)*Wstride) = 4,809 (assuming dilation=1), giving us a total of 65,289 entries. You can multiply that by the word size to get the capacity in bytes.

Now I'm curious, because it doesn't match the error message (unless I messed up the math somewhere above). Could you please email or upload the entire .cfg (arch, mapping, everything) so that I can reproduce at my end?

@aleczhanshi
Copy link
Author

@angshuman-parashar Thanks for doing the computation! I really appreciate it. The error for this set of parameters below is ERROR: couldn't map level GlobalBuffer: mapped tile size 62153 exceeds buffer capacity 32768. I've done the math and got the same results as you, which is 65289, but it ends up being 62153 instead. Not that much of difference but any clue why this is the case?

arch : 
{
  arithmetic : 
  {
    name = "MACs";
    instances = 256;
    word-bits = 16;
    meshX = 16;
  };
  storage = ( 
    {
      name = "PsumRegFile";
      entries = 16;
      instances = 256;
      meshX = 16;
      word-bits = 16;
      read_bandwidth = 2;
      write_bandwidth = 2;
    }, 
    {
      name = "WeightRegFile";
      entries = 192;
      instances = 256;
      meshX = 16;
      word-bits = 16;
      read_bandwidth = 2;
      write_bandwidth = 2;
    }, 
    {
      name = "InputRegFile";
      entries = 12;
      instances = 256;
      meshX = 16;
      word-bits = 16;
      read_bandwidth = 2;
      write_bandwidth = 2;
    }, 
    {
      name = "DummyBuffer";
      entries = 0;
      instances = 16;
      meshX = 16;
      word-bits = 16;
    }, 
    {
      name = "GlobalBuffer";
      sizeKB = 64;
      instances = 1;
      meshX = 1;
      word-bits = 16;
      block-size = 4;
      read_bandwidth = 16;
      write_bandwidth = 16;
    }, 
    {
      name = "DRAM";
      technology = "DRAM";
      instances = 1;
      word-bits = 16;
    } );
};

problem : 
{
  R = 7;
  S = 7;
  P = 112;
  Q = 112;
  C = 3;
  K = 64;
  N = 1;
  Wstride = 2;
  Hstride = 2;
};

mapping = (
    {
      target = 0;
      type = "datatype";
      keep = [ "Outputs" ];
      bypass = [ "Weights", "Inputs" ];
    }, 
    {
      target = 1;
      type = "datatype";
      keep = [ "Weights" ];
      bypass = [ "Inputs", "Outputs" ];
    }, 
    {
      target = 2;
      type = "datatype";
      keep = [ "Inputs" ];
      bypass = [ "Weights", "Outputs" ];
    }, 
    {
      target = 3;
      type = "datatype";
      keep = [ ];
      bypass = [ "Weights", "Inputs", "Outputs" ];
    }, 
    {
      target = 4;
      type = "datatype";
      keep = [ "Inputs", "Outputs" ];
      bypass = [ "Weights" ];
    }, 
    {
      target = 5;
      type = "datatype";
      keep = [ "Weights", "Inputs", "Outputs" ];
      bypass = [ ];
    }, 
    {
      target = 0;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K16 N1";
      permutation = "KRSPQCN";
    }, 
    {
      target = 1;
      type = "temporal";
      factors = "R7 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 2;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 3;
      type = "spatial";
      factors = "R1 S7 P1 Q1 C1 K2 N1";
      permutation = "SKRPQCN";
      split = 0;
    }, 
    {
      target = 3;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 4;
      type = "spatial";
      factors = "R1 S1 P1 Q8 C1 K2 N1";
      permutation = "QKRSPCN";
      split = 2;
    }, 
    {
      target = 4;
      type = "temporal";
      factors = "R1 S1 P112 Q1 C1 K1 N1";
      permutation = "PRSQCKN";
    }, 
    {
      target = 5;
      type = "temporal";
      factors = "R1 S1 P1 Q14 C3 K1 N1";
      permutation = "CQKRSPN";
    }
);

@aleczhanshi
Copy link
Author

aleczhanshi commented Mar 31, 2020

@angshuman-parashar Another question is, I assume that the permutation will not affect the tile size, is it true?

Further, I guess that only those non-one factors will count in the permutation in terms of performance implications. For example, if I have R1 S1 P1 Q8 C1 K2 N1, only the order of Q and K affects the performance because other factors are all ones. In other words, {QK}RSPCN should be same as RSPCN{QK}, and also {QK}PCNRS. Is it correct?

@angshuman-parashar
Copy link
Collaborator

angshuman-parashar commented Mar 31, 2020

Re. your earlier question: Look at the bypass settings. Weights are being bypassed at that level. 65289 - 62153 = 3136, which is the weight tile :).

Re. your most recent question: Correct, permutation does not affect size. And correct, permutations of only non-unit factors affect performance/energy efficiency. In fact, this is something that the mapper exploits to prune the search space.

@aleczhanshi
Copy link
Author

@angshuman-parashar Thanks! It makes a lot of sense. I really appreciate it!

@agarwal-ayushi
Copy link

agarwal-ayushi commented Jan 22, 2022

Hi @aleczhanshi and @angshuman-parashar : I am facing a similar issue while trying to convert the mapper output map.txt file to .yaml format for the timeloop-model. I am specifically working on the tutorial example: timeloop-accelergy-exercises/workspace/exercises/2020.ispass/timeloop/06-mapper-convlayer-eyeriss

For the mapping given in ref-output: timeloop-mapper.map.txt: here
Motivation for my work: I want to use sparse-opt in the timeloop-model on a particular mapping to study impact of sparsity. timeloop-model uses map.yaml. Hence, this effort.
I wrote a map.yaml file:

mapping:
- target: DRAM
type: temporal
factors: Q=4 M=4 C=8 P=1 R=1 S=1 N=1
permutation: CMQPRSN

- target: shared_glb
type: temporal
factors: M=4 P=56 Q=1 R=1 S=1 C=1 N=1
permutation: QMPRSCN

- target: shared_glb
type: spatial
factors: Q=14 M=1 P=1 C=1 R=1 S=1 N=1
permutation: QMPCRSN
split: 1

- target: DummyBuffer
type: temporal
factors: Q=1 M=1 C=1 S=1 P=1 R=1 N=1
permutation: MSCQPRN

- target: DummyBuffer
type: spatial
factors: Q=1 C=4 S=3 P=1 R=1 N=1 M=1
permutation: PRNMQSC
split: 4

- target: ifmap_spad
type: temporal
factors: Q=1 M=1 C=1 S=1 P=1 R=1 N=1
permutation: CMQSPRN

- target: weights_spad
type: temporal
factors: R=3 C=4 N=1 S=1 P=1 Q=1 M=1
permutation: CRNSPQM

- target: psum_spad
type: temporal
factors: M=16 R=1 C=1 N=1 S=1 P=1 Q=1
permutation: MRCNSPQ

However when I run:
timeloop-model arch/eyeriss_like.yaml arch/components/*.yaml prob/VGG02_layer5.yaml trial_map.yaml
I get this error: I have been unable to figure out the problem in my mapping. Any help would be great. No other files have been modified.

Sparse optimization configuration complete.
ERROR: couldn't map level psum_spad: mapped tile size 33 exceeds buffer capacity 16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants