Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to merge list of plans without batching #26

Closed
Risord opened this issue Sep 25, 2017 · 5 comments
Closed

How to merge list of plans without batching #26

Risord opened this issue Sep 25, 2017 · 5 comments

Comments

@Risord
Copy link

Risord commented Sep 25, 2017

Hi

I am creating SQL query which insert potentially thousands of rows. Issue is that there is max length of parameters in query: yiisoft/yii2#10371

If actions doesn't return anything you can just combine bathing and non batching for loops:

plan {
    let chunks =
        data
        |> List.chunkBySize 100
    
    for chunk in chunks do
        for item in batch chunk do
            do! doSomething item
}

On the other hand you need to return data so concurrentList should(?) be used:

Plan.concurrentList
    [ for d in data ->
        doSomething d
    ]

But this batches all the work and may be fail for too big query. So if there would be Plan.nonConcurrentList I could do something like this:

let chunks =
    spots
    |> List.chunkBySize 100

let plans =
    chunks
    |> List.map (fun chunk -> 
        Plan.concurrentList
            [ for d in chunk ->
                doSomething d
            ]
        )

Plan.nonConcurrentList plans

Any hints?

@rkosafo
Copy link
Contributor

rkosafo commented Sep 25, 2017

In this case, what is the output of Plan.nonConcurrentList?

@Risord
Copy link
Author

Risord commented Sep 25, 2017

I tough it could to be just equal to Plan.concurrentList : List<Plan<'a>> -> Plan<List<'a>>
Example from last post was a bit imprecise. Hopefully this is better:

let doAsChunks (doPlan : 'a -> Plan<'b>) (data : List<'a>) : Plan<List<'b>> =
    let chunks : List<List<'a>> =
        data
        |> List.chunkBySize 100
    
    let chunkedPlans : List<Plan<List<'b>>> =
        chunks
        |> List.map (fun chunk -> 
            Plan.concurrentList
                [ for d in chunk ->
                    doPlan d
                ]
            )
    
    chunkedPlans
    |> Plan.nonConcurrentList // List<Plan<'x>> -> Plan<List<'x>> just like Plan.concurrentList
    |> Plan.map (List.collect id) //Hopefully this won't mess anything?

So any tips how to implement Plan.nonConcurrentList?

Thanks!

@Risord Risord closed this as completed Sep 25, 2017
@Risord Risord reopened this Sep 25, 2017
@rkosafo
Copy link
Contributor

rkosafo commented Sep 26, 2017

Ok. After going through Plan's code, below is the sample I could come up with. From the console output, it seems to run sequentially. Do run the code to see if it is what you are looking for. Perhaps, @rspeele can confirm if this is a good approach.

 let test () =
  let doSomething x = Plan.ret x
  let spots = [ 1 .. 100 ]
  let chunks = spots |> List.chunkBySize 10
  let plans =
    chunks
    |> List.map (fun chunk -> 
        Plan.concurrentList
          [ for d in chunk -> doSomething d ])
  let rec nonConcurrent xs =
    match xs with
    | [] -> Plan.ret []
    | x :: xs' ->
      let run xOut =
        printfn "In run. xOut is: %A" xOut
        nonConcurrent xs' |> Plan.map (fun xsOut -> xOut :: xsOut)  
      Plan.bind x run
  nonConcurrent plans
  |> Plan.map (List.collect id)
  |> Execution.execute Execution.ExecutionConfig.Default
  |> fun x -> printfn "Result: %A" x.Result

@rspeele
Copy link
Collaborator

rspeele commented Sep 26, 2017

Hi,

The definition for nonConcurrent proposed by @rkosafo should work.

However, before you jump to using that, have you tested concurrentList with a large number of queries?

When it builds a query batch, Rezoom.SQL won't add a new query to the batch if it would cause the total number of parameters to exceed the limit for the SqlConnection implementation. So it should be fine to create huge batches with concurrentList -- they'll be transparently split into a series of smaller batches as needed. If not, it's a bug in Rezoom.SQL, not Rezoom. Now, this can't help with cases where > 2100 parameters are used in a single query (e.g. select * where Id in @parameterList with parameterList being huge), but you shouldn't have to worry when batching together many queries that are each within the limit.

@Risord
Copy link
Author

Risord commented Sep 27, 2017

Thanks @rkosafo it works perfectly!

@rspeele I have got error with large in-statements but also with inserts. Columns may vary but issue remains. All inserts are currently in this form:

    insert into T
    row
        A = @a,
        B = @b,
        C = @c;
    select scope_identity() as Id; //Table have also auto-increment primary key called 'Id'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants