In [15]:
#r "nuget: Newtonsoft.Json"
#r "nuget: Microsoft.Azure.Cosmos"
using Microsoft.Azure.Cosmos;
using System.Net.Http;
using Newtonsoft.Json;
using System.Collections.ObjectModel;


In [None]:
var cstring = "AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";
client = new CosmosClient(cstring, new CosmosClientOptions() {AllowBulkExecution = true});
var db = client.GetDatabase("StackOverflow");
ContainerProperties props = new ContainerProperties("Posts3", "/OwnerUserId");
var postContainer = await db.CreateContainerIfNotExistsAsync(props, throughput: 4000);


In [21]:
public class Post    
{
    public string id { get; set; }
    public int PostId { get; set; }
    public string PostBody { get; set; }
    public string Title { get; set; }
    public int ViewCount { get; set; }
    public int AnswerCount { get; set; }
    public int CommentCount { get; set; }
    public int FavoriteCount { get; set; }
    public int AcceptedAnswerId { get; set; }
    public DateTime? CreatedOn { get; set; }
    public DateTime? ClosedDate { get; set; }
    public int OwnerUserId { get; set; }
    public string OwnerDisplayName { get; set; }
    public string PostType { get; set; }
    public int Score { get; set; }
    public string Tags { get; set; }
    public float[] bodyvector {get;set;}
}

In [25]:
using Newtonsoft.Json.Linq;

// It takes 15 minutes to run this code
var json = await new HttpClient().GetStringAsync("https://raw.githubusercontent.com/hsavran/Presentations/refs/heads/main/stackoverflow.json");
var postList = JsonConvert.DeserializeObject<List<Post>>(json);
postList.Count.Display(); // Display the number of posts
//write the post to the container in bulk mode
var postContainer = db.GetContainer("Posts3");
foreach (var post in postList)
{        
    var response = await postContainer.CreateItemAsync(post, new PartitionKey(post.OwnerUserId));
    //response.StatusCode.Display(); // Display the status code of the response
}

In [39]:
var cmd = "SELECT * FROM c WHERE c.OwnerUserId = 1 and c.PostType = 'Question'";
var postQuery = new QueryDefinition(cmd);
var iterator = postContainer.GetItemQueryIterator<Post>(postQuery);
var results = new List<Post>();

while (iterator.HasMoreResults)
{
    var response = await iterator.ReadNextAsync();    
    results.AddRange(response);
    response.RequestCharge.Display(); // Display the request charge of the response
}
results.Display();

index,value
,
,
,
,
0,"Submission#36+Postida56e5b7e-ee9b-4a5f-a51b-b0e559e5901cPostId20047PostBody<p>We're seeing some pernicious, but rare, deadlock conditions in the Stack Overflow SQL Server 2005 database.</p> <p>I attached the profiler, set up a trace profile using <a href=""http://www.simple-talk.com/sql/learn-sql-server/how-to-track-down-deadlocks-using-sql-server-2005-profiler/"" rel=""noreferrer"">this excellent article on troubleshooting deadlocks</a>, and captured a bunch of examples. The weird thing is that <strong>the deadlocking write is <em>always</em> the same</strong>:</p> <pre><code>UPDATE [dbo].[Posts] SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3 WHERE [Id] = @p0 </code></pre> <p>The other deadlocking statement varies, but it's usually some kind of trivial, simple <strong>read</strong> of the posts table. This one always gets killed in the deadlock. Here's an example</p> <pre><code>SELECT [t0].[Id], [t0].[PostTypeId], [t0].[Score], [t0].[Views], [t0].[AnswerCount], [t0].[AcceptedAnswerId], [t0].[IsLocked], [t0].[IsLockedEdit], [t0].[ParentId], [t0].[CurrentRevisionId], [t0].[FirstRevisionId], [t0].[LockedReason], [t0].[LastActivityDate], [t0].[LastActivityUserId] FROM [dbo].[Posts] AS [t0] WHERE [t0].[ParentId] = @p0 </code></pre> <p>To be perfectly clear, we are not seeing write / write deadlocks, but read / write.</p> <p>We have a mixture of LINQ and parameterized SQL queries at the moment. We have added <code>with (nolock)</code> to all the SQL queries. This may have helped some. We also had a single (very) poorly-written badge query that I fixed yesterday, which was taking upwards of 20 seconds to run every time, and was running every minute on top of that. I was hoping this was the source of some of the locking problems!</p> <p>Unfortunately, I got another deadlock error about 2 hours ago. Same exact symptoms, same exact culprit write.</p> <p>The truly strange thing is that the locking write SQL statement you see above is part of a very specific code path. It's <em>only</em> executed when a new answer is added to a question -- it updates the parent question with the new answer count and last date/user. This is, obviously, not that common relative to the massive number of reads we are doing! As far as I can tell, we're not doing huge numbers of writes anywhere in the app.</p> <p>I realize that NOLOCK is sort of a giant hammer, but most of the queries we run here don't need to be that accurate. Will you care if your user profile is a few seconds out of date?</p> <p>Using NOLOCK with Linq is a bit more difficult as <a href=""http://www.hanselman.com/blog/GettingLINQToSQLAndLINQToEntitiesToUseNOLOCK.aspx"" rel=""noreferrer"">Scott Hanselman discusses here</a>.</p> <p>We are flirting with the idea of using</p> <pre><code>SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED </code></pre> <p>on the base database context so that all our LINQ queries have this set. Without that, we'd have to wrap every LINQ call we make (well, the simple reading ones, which is the vast majority of them) in a 3-4 line transaction code block, which is ugly.</p> <p>I guess I'm a little frustrated that trivial reads in SQL 2005 can deadlock on writes. I could see write/write deadlocks being a huge issue, but <em>reads?</em> We're not running a banking site here, we don't need perfect accuracy every time.</p> <p>Ideas? Thoughts?</p> <hr> <blockquote>  <p>Are you instantiating a new LINQ to SQL DataContext object for every operation or are you perhaps sharing the same static context for all your calls?</p> </blockquote> <p>Jeremy, we are sharing one static datacontext in the base Controller for the most part:</p> <pre><code>private DBContext _db; /// &lt;summary&gt; /// Gets the DataContext to be used by a Request's controllers. /// &lt;/summary&gt; public DBContext DB {  get  {  if (_db == null)  {  _db = new DBContext() { SessionName = GetType().Name };  //_db.ExecuteCommand(""SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED"");  }  return _db;  } } </code></pre> <p>Do you recommend we create a new context for every Controller, or per Page, or .. more often?</p> TitleDiagnosing Deadlocks in SQL Server 2005ViewCount27533AnswerCount22CommentCount5FavoriteCount58AcceptedAnswerId21158CreatedOn2008-08-21 14:18:41ZClosedDate<null>OwnerUserId1OwnerDisplayNameJeff AtwoodPostTypeQuestionScore82Tags<sql-server><sql-server-2005><deadlock>"
,
id,a56e5b7e-ee9b-4a5f-a51b-b0e559e5901c
PostId,20047
PostBody,"<p>We're seeing some pernicious, but rare, deadlock conditions in the Stack Overflow SQL Server 2005 database.</p> <p>I attached the profiler, set up a trace profile using <a href=""http://www.simple-talk.com/sql/learn-sql-server/how-to-track-down-deadlocks-using-sql-server-2005-profiler/"" rel=""noreferrer"">this excellent article on troubleshooting deadlocks</a>, and captured a bunch of examples. The weird thing is that <strong>the deadlocking write is <em>always</em> the same</strong>:</p> <pre><code>UPDATE [dbo].[Posts] SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3 WHERE [Id] = @p0 </code></pre> <p>The other deadlocking statement varies, but it's usually some kind of trivial, simple <strong>read</strong> of the posts table. This one always gets killed in the deadlock. Here's an example</p> <pre><code>SELECT [t0].[Id], [t0].[PostTypeId], [t0].[Score], [t0].[Views], [t0].[AnswerCount], [t0].[AcceptedAnswerId], [t0].[IsLocked], [t0].[IsLockedEdit], [t0].[ParentId], [t0].[CurrentRevisionId], [t0].[FirstRevisionId], [t0].[LockedReason], [t0].[LastActivityDate], [t0].[LastActivityUserId] FROM [dbo].[Posts] AS [t0] WHERE [t0].[ParentId] = @p0 </code></pre> <p>To be perfectly clear, we are not seeing write / write deadlocks, but read / write.</p> <p>We have a mixture of LINQ and parameterized SQL queries at the moment. We have added <code>with (nolock)</code> to all the SQL queries. This may have helped some. We also had a single (very) poorly-written badge query that I fixed yesterday, which was taking upwards of 20 seconds to run every time, and was running every minute on top of that. I was hoping this was the source of some of the locking problems!</p> <p>Unfortunately, I got another deadlock error about 2 hours ago. Same exact symptoms, same exact culprit write.</p> <p>The truly strange thing is that the locking write SQL statement you see above is part of a very specific code path. It's <em>only</em> executed when a new answer is added to a question -- it updates the parent question with the new answer count and last date/user. This is, obviously, not that common relative to the massive number of reads we are doing! As far as I can tell, we're not doing huge numbers of writes anywhere in the app.</p> <p>I realize that NOLOCK is sort of a giant hammer, but most of the queries we run here don't need to be that accurate. Will you care if your user profile is a few seconds out of date?</p> <p>Using NOLOCK with Linq is a bit more difficult as <a href=""http://www.hanselman.com/blog/GettingLINQToSQLAndLINQToEntitiesToUseNOLOCK.aspx"" rel=""noreferrer"">Scott Hanselman discusses here</a>.</p> <p>We are flirting with the idea of using</p> <pre><code>SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED </code></pre> <p>on the base database context so that all our LINQ queries have this set. Without that, we'd have to wrap every LINQ call we make (well, the simple reading ones, which is the vast majority of them) in a 3-4 line transaction code block, which is ugly.</p> <p>I guess I'm a little frustrated that trivial reads in SQL 2005 can deadlock on writes. I could see write/write deadlocks being a huge issue, but <em>reads?</em> We're not running a banking site here, we don't need perfect accuracy every time.</p> <p>Ideas? Thoughts?</p> <hr> <blockquote>  <p>Are you instantiating a new LINQ to SQL DataContext object for every operation or are you perhaps sharing the same static context for all your calls?</p> </blockquote> <p>Jeremy, we are sharing one static datacontext in the base Controller for the most part:</p> <pre><code>private DBContext _db; /// &lt;summary&gt; /// Gets the DataContext to be used by a Request's controllers. /// &lt;/summary&gt; public DBContext DB {  get  {  if (_db == null)  {  _db = new DBContext() { SessionName = GetType().Name };  //_db.ExecuteCommand(""SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED"");  }  return _db;  } } </code></pre> <p>Do you recommend we create a new context for every Controller, or per Page, or .. more often?</p>"
Title,Diagnosing Deadlocks in SQL Server 2005

Unnamed: 0,Unnamed: 1
id,a56e5b7e-ee9b-4a5f-a51b-b0e559e5901c
PostId,20047
PostBody,"<p>We're seeing some pernicious, but rare, deadlock conditions in the Stack Overflow SQL Server 2005 database.</p> <p>I attached the profiler, set up a trace profile using <a href=""http://www.simple-talk.com/sql/learn-sql-server/how-to-track-down-deadlocks-using-sql-server-2005-profiler/"" rel=""noreferrer"">this excellent article on troubleshooting deadlocks</a>, and captured a bunch of examples. The weird thing is that <strong>the deadlocking write is <em>always</em> the same</strong>:</p> <pre><code>UPDATE [dbo].[Posts] SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3 WHERE [Id] = @p0 </code></pre> <p>The other deadlocking statement varies, but it's usually some kind of trivial, simple <strong>read</strong> of the posts table. This one always gets killed in the deadlock. Here's an example</p> <pre><code>SELECT [t0].[Id], [t0].[PostTypeId], [t0].[Score], [t0].[Views], [t0].[AnswerCount], [t0].[AcceptedAnswerId], [t0].[IsLocked], [t0].[IsLockedEdit], [t0].[ParentId], [t0].[CurrentRevisionId], [t0].[FirstRevisionId], [t0].[LockedReason], [t0].[LastActivityDate], [t0].[LastActivityUserId] FROM [dbo].[Posts] AS [t0] WHERE [t0].[ParentId] = @p0 </code></pre> <p>To be perfectly clear, we are not seeing write / write deadlocks, but read / write.</p> <p>We have a mixture of LINQ and parameterized SQL queries at the moment. We have added <code>with (nolock)</code> to all the SQL queries. This may have helped some. We also had a single (very) poorly-written badge query that I fixed yesterday, which was taking upwards of 20 seconds to run every time, and was running every minute on top of that. I was hoping this was the source of some of the locking problems!</p> <p>Unfortunately, I got another deadlock error about 2 hours ago. Same exact symptoms, same exact culprit write.</p> <p>The truly strange thing is that the locking write SQL statement you see above is part of a very specific code path. It's <em>only</em> executed when a new answer is added to a question -- it updates the parent question with the new answer count and last date/user. This is, obviously, not that common relative to the massive number of reads we are doing! As far as I can tell, we're not doing huge numbers of writes anywhere in the app.</p> <p>I realize that NOLOCK is sort of a giant hammer, but most of the queries we run here don't need to be that accurate. Will you care if your user profile is a few seconds out of date?</p> <p>Using NOLOCK with Linq is a bit more difficult as <a href=""http://www.hanselman.com/blog/GettingLINQToSQLAndLINQToEntitiesToUseNOLOCK.aspx"" rel=""noreferrer"">Scott Hanselman discusses here</a>.</p> <p>We are flirting with the idea of using</p> <pre><code>SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED </code></pre> <p>on the base database context so that all our LINQ queries have this set. Without that, we'd have to wrap every LINQ call we make (well, the simple reading ones, which is the vast majority of them) in a 3-4 line transaction code block, which is ugly.</p> <p>I guess I'm a little frustrated that trivial reads in SQL 2005 can deadlock on writes. I could see write/write deadlocks being a huge issue, but <em>reads?</em> We're not running a banking site here, we don't need perfect accuracy every time.</p> <p>Ideas? Thoughts?</p> <hr> <blockquote>  <p>Are you instantiating a new LINQ to SQL DataContext object for every operation or are you perhaps sharing the same static context for all your calls?</p> </blockquote> <p>Jeremy, we are sharing one static datacontext in the base Controller for the most part:</p> <pre><code>private DBContext _db; /// &lt;summary&gt; /// Gets the DataContext to be used by a Request's controllers. /// &lt;/summary&gt; public DBContext DB {  get  {  if (_db == null)  {  _db = new DBContext() { SessionName = GetType().Name };  //_db.ExecuteCommand(""SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED"");  }  return _db;  } } </code></pre> <p>Do you recommend we create a new context for every Controller, or per Page, or .. more often?</p>"
Title,Diagnosing Deadlocks in SQL Server 2005
ViewCount,27533
AnswerCount,22
CommentCount,5
FavoriteCount,58
AcceptedAnswerId,21158
CreatedOn,2008-08-21 14:18:41Z

Unnamed: 0,Unnamed: 1
id,28f9871f-26b6-4774-8e19-dab2c82d6739
PostId,8472
PostBody,"<p>It looks like we'll be adding <a href=""http://en.wikipedia.org/wiki/Captcha"" rel=""noreferrer"">CAPTCHA</a> support to Stack Overflow. This is necessary to prevent bots, spammers, and other malicious scripted activity. We only want human beings to post or edit things here!</p> <p>We'll be using a JavaScript (jQuery) CAPTCHA as a first line of defense:</p> <p><a href=""http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs"" rel=""noreferrer"">http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs</a></p> <p>The advantage of this approach is that, <strong>for most people, the CAPTCHA won't ever be visible!</strong></p> <p>However, for people with JavaScript disabled, we still need a fallback and this is where it gets tricky.</p> <p>I have written a <a href=""http://www.codeproject.com/KB/custom-controls/CaptchaControl.aspx"" rel=""noreferrer"">traditional CAPTCHA control for ASP.NET</a> which we can re-use.</p> <p><img src=""https://i.stack.imgur.com/Puvbf.jpg"" alt=""CaptchaImage""></p> <p>However, I'd prefer to go with something textual to avoid the overhead of creating all these images on the server with each request.</p> <p>I've seen things like..</p> <ul> <li>ASCII text captcha: <code>\/\/(_)\/\/</code></li> <li>math puzzles: what is 7 minus 3 times 2?</li> <li>trivia questions: what tastes better, a toad or a popsicle?</li> </ul> <p>Maybe I'm just tilting at windmills here, but I'd like to have a less resource intensive, non-image based <code>&lt;noscript&gt;</code> compatible CAPTCHA if possible.</p> <p>Ideas?</p>"
Title,Practical non-image based CAPTCHA approaches?
ViewCount,72431
AnswerCount,103
CommentCount,25
FavoriteCount,326
AcceptedAnswerId,8637
CreatedOn,2008-08-12 04:59:35Z

Unnamed: 0,Unnamed: 1
id,e341c49f-d553-4f10-b0de-18bc8fd3480c
PostId,9
PostBody,"<p>Given a <code>DateTime</code> representing a person's birthday, how do I calculate their age in years? </p>"
Title,How do I calculate someone's age in C#?
ViewCount,480476
AnswerCount,64
CommentCount,7
FavoriteCount,399
AcceptedAnswerId,1404
CreatedOn,2008-07-31 23:40:59Z

Unnamed: 0,Unnamed: 1
id,bb87518d-489d-4ee5-a19d-8835c96596a9
PostId,11
PostBody,"<p>Given a specific <code>DateTime</code> value, how do I display relative time, like:</p> <ul> <li>2 hours ago</li> <li>3 days ago</li> <li>a month ago</li> </ul>"
Title,Calculate relative time in C#
ViewCount,136033
AnswerCount,35
CommentCount,3
FavoriteCount,529
AcceptedAnswerId,1248
CreatedOn,2008-07-31 23:55:37Z
