Skip to content

An optimal stratified sample design for Commodity Flow Survey (CFS) based on Simulated Annealing and Genetic Algorithm. A script in Procedural PostgreSQL is used to generate a frame with 100,000 records based on publicly available data.

License

Notifications You must be signed in to change notification settings

saeedt/CFS_Sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CFS Sample Design Data and Scripts

Data and Scripts for the proposed sample design for CFS are stored in this repository. Following are the list of folders and their content.

Raw_Data

Main data sources used for generating the sample data are stored in this folder.

SQL

SQL Scripts used to create tables and anaylze the raw data are store in this folder. We used PostgreSQL which is a free open source database management system (DBMS). The queries and functions can be run on PostgreSQL 9.6 or later. Running on other SQL compatible DBMSs such as MySQL/MriaDB or MS SQL Server may require minor modifications.

  • SQL_Scripts.sql includes the scripts for creating tables and all queries developed for cleaning and aggregating the data. The comments in this file provide a high level explanation of each step. We used Common Table Expressions (CTEs) to merge multiple related queries in one step.
  • `Generate_est.sql' includes a function written in procedural PostgreSQL language that generates a sampling frame with user defined parameters based on CBP and FAF datasets.

Final_Data

Includes the final output of the scripts in SQL folder applied to the data in Raw_Data.

  • fafcbp.csv is the combined FAF and CBP datasets in CSV format. It is the disaggregated FAF data by county and NAICS based on CBP data. This data is needed by the generate_est function presented in SQL folder.
  • 100K_Frame_newCFS.csv is a set of 100,000 establishments generated with the generate_est function.

R_Scrripts

Includes the R scripts, functions used in the document.

About

An optimal stratified sample design for Commodity Flow Survey (CFS) based on Simulated Annealing and Genetic Algorithm. A script in Procedural PostgreSQL is used to generate a frame with 100,000 records based on publicly available data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •