Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Subdir templates #34

Closed
wants to merge 2 commits into from

3 participants

Mike Lewis Sandro Tosi James Pearce
Mike Lewis

Hi,

I made some tweaks to allow for subdirectories to be defined by using ${YEAR}, ${MONTH}, ${DAY}, and ${HOUR} which allows for simple partitioning with HDFS and whatnot. It is much more flexible than the "store_tree" change that was created a while ago (and then looks like it was removed).

If you'd like me to make some changes just let me know.

Thanks,
Mike

Mike Lewis added some commits
Mike Lewis Added boost regex library to support filename templates 0acb954
Mike Lewis Modified FileStoreBase to support using subdirectories based on curre…
…nt date

This involved the following changes:
  * remove filePath from FileStoreBase
    and replaced with method makeFilePath
    which takes a tm object
  * make every function that formerly
    referenced filePath to reference require
    a tm argument

This will subsitute ${YEAR}, ${MONTH}, ${DAY}, and ${HOUR} in
the 'sub_directory' argument in the configuration file. ex:

  sub_directory=${YEAR}/${MONTH}/${DAY}

Or if for having it work with hive partitions:

  sub_directory=dt=${YEAR}${MONTH}${DAY}
5f87c08
Mike Lewis

Hey this is a test

Sandro Tosi

Hi Mike,
this is a very interesting feature and we're considering adding the patch to our scribe log server instance, but I have a comment :)

AFAIUI once the patch is in, all the paths for a hourly rotated store are split in subdirs, while I see cases where one want to choose to enable this method only in some stores (the ones with more traffic/data) while leaving the default behavior in the others.

So, would you consider adding a config option to enable/disable the "split paths" in the store configuration? I don't have the coding skills to do that myself, hence why I'm asking :)

James Pearce

Facebook has not maintained or supported Scribe externally for some time, and we are closing its old and outstanding pull requests.

Many, many thanks for your support of the project. If you have any further questions, please don't hesitate to let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 9, 2010
  1. Modified FileStoreBase to support using subdirectories based on curre…

    Mike Lewis authored
    …nt date
    
    This involved the following changes:
      * remove filePath from FileStoreBase
        and replaced with method makeFilePath
        which takes a tm object
      * make every function that formerly
        referenced filePath to reference require
        a tm argument
    
    This will subsitute ${YEAR}, ${MONTH}, ${DAY}, and ${HOUR} in
    the 'sub_directory' argument in the configuration file. ex:
    
      sub_directory=${YEAR}/${MONTH}/${DAY}
    
    Or if for having it work with hive partitions:
    
      sub_directory=dt=${YEAR}${MONTH}${DAY}
This page is out of date. Refresh to see the latest.
111 aclocal/ax_boost_regex.m4
View
@@ -0,0 +1,111 @@
+# ===========================================================================
+# http://www.gnu.org/software/autoconf-archive/ax_boost_regex.html
+# ===========================================================================
+#
+# SYNOPSIS
+#
+# AX_BOOST_REGEX
+#
+# DESCRIPTION
+#
+# Test for Regex library from the Boost C++ libraries. The macro requires
+# a preceding call to AX_BOOST_BASE. Further documentation is available at
+# <http://randspringer.de/boost/index.html>.
+#
+# This macro calls:
+#
+# AC_SUBST(BOOST_REGEX_LIB)
+#
+# And sets:
+#
+# HAVE_BOOST_REGEX
+#
+# LICENSE
+#
+# Copyright (c) 2008 Thomas Porschberg <thomas@randspringer.de>
+# Copyright (c) 2008 Michael Tindal
+#
+# Copying and distribution of this file, with or without modification, are
+# permitted in any medium without royalty provided the copyright notice
+# and this notice are preserved. This file is offered as-is, without any
+# warranty.
+
+#serial 19
+
+AC_DEFUN([AX_BOOST_REGEX],
+[
+ AC_ARG_WITH([boost-regex],
+ AS_HELP_STRING([--with-boost-regex@<:@=special-lib@:>@],
+ [use the Regex library from boost - it is possible to specify a certain library for the linker
+ e.g. --with-boost-regex=boost_regex-gcc-mt-d-1_33_1 ]),
+ [
+ if test "$withval" = "no"; then
+ want_boost="no"
+ elif test "$withval" = "yes"; then
+ want_boost="yes"
+ ax_boost_user_regex_lib=""
+ else
+ want_boost="yes"
+ ax_boost_user_regex_lib="$withval"
+ fi
+ ],
+ [want_boost="yes"]
+ )
+
+ if test "x$want_boost" = "xyes"; then
+ AC_REQUIRE([AC_PROG_CC])
+ CPPFLAGS_SAVED="$CPPFLAGS"
+ CPPFLAGS="$CPPFLAGS $BOOST_CPPFLAGS"
+ export CPPFLAGS
+
+ LDFLAGS_SAVED="$LDFLAGS"
+ LDFLAGS="$LDFLAGS $BOOST_LDFLAGS"
+ export LDFLAGS
+
+ AC_CACHE_CHECK(whether the Boost::Regex library is available,
+ ax_cv_boost_regex,
+ [AC_LANG_PUSH([C++])
+ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[@%:@include <boost/regex.hpp>
+ ]],
+ [[boost::regex r(); return 0;]])],
+ ax_cv_boost_regex=yes, ax_cv_boost_regex=no)
+ AC_LANG_POP([C++])
+ ])
+ if test "x$ax_cv_boost_regex" = "xyes"; then
+ AC_DEFINE(HAVE_BOOST_REGEX,,[define if the Boost::Regex library is available])
+ BOOSTLIBDIR=`echo $BOOST_LDFLAGS | sed -e 's/@<:@^\/@:>@*//'`
+ if test "x$ax_boost_user_regex_lib" = "x"; then
+ for libextension in `ls $BOOSTLIBDIR/libboost_regex*.so* $BOOSTLIBDIR/libboost_regex*.a* 2>/dev/null | sed 's,.*/,,' | sed -e 's;^lib\(boost_regex.*\)\.so.*$;\1;' -e 's;^lib\(boost_regex.*\)\.a*$;\1;'` ; do
+ ax_lib=${libextension}
+ AC_CHECK_LIB($ax_lib, exit,
+ [BOOST_REGEX_LIB="-l$ax_lib"; AC_SUBST(BOOST_REGEX_LIB) link_regex="yes"; break],
+ [link_regex="no"])
+ done
+ if test "x$link_regex" != "xyes"; then
+ for libextension in `ls $BOOSTLIBDIR/boost_regex*.{dll,a}* 2>/dev/null | sed 's,.*/,,' | sed -e 's;^\(boost_regex.*\)\.dll.*$;\1;' -e 's;^\(boost_regex.*\)\.a*$;\1;'` ; do
+ ax_lib=${libextension}
+ AC_CHECK_LIB($ax_lib, exit,
+ [BOOST_REGEX_LIB="-l$ax_lib"; AC_SUBST(BOOST_REGEX_LIB) link_regex="yes"; break],
+ [link_regex="no"])
+ done
+ fi
+
+ else
+ for ax_lib in $ax_boost_user_regex_lib boost_regex-$ax_boost_user_regex_lib; do
+ AC_CHECK_LIB($ax_lib, main,
+ [BOOST_REGEX_LIB="-l$ax_lib"; AC_SUBST(BOOST_REGEX_LIB) link_regex="yes"; break],
+ [link_regex="no"])
+ done
+ fi
+ if test "x$ax_lib" = "x"; then
+ AC_MSG_ERROR(Could not find a version of the library!)
+ fi
+ if test "x$link_regex" != "xyes"; then
+ AC_MSG_ERROR(Could not link against $ax_lib !)
+ fi
+ fi
+
+ CPPFLAGS="$CPPFLAGS_SAVED"
+ LDFLAGS="$LDFLAGS_SAVED"
+ fi
+])
1  configure.ac
View
@@ -63,6 +63,7 @@ FB_WITH_PATH([hadoop_home], [hadooppath], [/usr/local])
AX_BOOST_BASE([1.36])
AX_BOOST_SYSTEM
AX_BOOST_FILESYSTEM
+AX_BOOST_REGEX
# Generates Makefile from Makefile.am. Modify when new subdirs are added.
# Change Makefile.am also to add subdirectly.
2  src/Makefile.am
View
@@ -72,7 +72,7 @@ AM_CPPFLAGS += -I$(hadoop_home)/include
AM_CPPFLAGS += $(BOOST_CPPFLAGS)
AM_CPPFLAGS += $(FB_CPPFLAGS) $(DEBUG_CPPFLAGS)
-AM_LDFLAGS = $(BOOST_LDFLAGS) $(BOOST_SYSTEM_LIB) $(BOOST_FILESYSTEM_LIB)
+AM_LDFLAGS = $(BOOST_LDFLAGS) $(BOOST_SYSTEM_LIB) $(BOOST_FILESYSTEM_LIB) $(BOOST_REGEX_LIB)
# Section 3 #############################################################################
# GENERATE BUILD RULES
85 src/store.cpp
View
@@ -25,6 +25,7 @@
// @author John Song
#include <algorithm>
+#include <boost/regex.hpp>
#include "common.h"
#include "scribe_server.h"
#include "network_dynamic_config.h"
@@ -182,7 +183,6 @@ FileStoreBase::FileStoreBase(StoreQueue* storeq,
: Store(storeq, category, type, multi_category),
baseFilePath("/tmp"),
subDirectory(""),
- filePath("/tmp"),
baseFileName(category),
baseSymlinkName(""),
maxSize(DEFAULT_FILESTORE_MAX_SIZE),
@@ -220,11 +220,6 @@ void FileStoreBase::configure(pStoreConf configuration, pStoreConf parent) {
setHostNameSubDir();
}
- filePath = baseFilePath;
- if (!subDirectory.empty()) {
- filePath += "/" + subDirectory;
- }
-
if (!configuration->getString("base_filename", baseFileName)) {
LOG_OPER(
@@ -349,10 +344,6 @@ void FileStoreBase::copyCommon(const FileStoreBase *base) {
* unique
*/
baseFilePath = base->baseFilePath + std::string("/") + categoryHandled;
- filePath = baseFilePath;
- if (!subDirectory.empty()) {
- filePath += "/" + subDirectory;
- }
baseFileName = categoryHandled;
}
@@ -405,17 +396,47 @@ void FileStoreBase::rotateFile(time_t currentTime) {
makeBaseFilename(&timeinfo).c_str(), currentSize,
maxSize == ULONG_MAX ? 0 : maxSize);
- printStats();
+ printStats(&timeinfo);
openInternal(true, &timeinfo);
}
+static const boost::regex filename_replace_re(
+ "(\\$\\{YEAR\\})|"
+ "(\\$\\{MONTH\\})|"
+ "(\\$\\{DAY\\})|"
+ "(\\$\\{HOUR\\})");
+
+
+string FileStoreBase::substituteFilenameString(string &orig_filename, struct tm* creation_time) {
+ ostringstream formatstring;
+ formatstring
+ << "(?{1}" << creation_time->tm_year + 1900 << ')'
+ << "(?{2}" << setw(2) << setfill('0') << creation_time->tm_mon + 1 << ')'
+ << "(?{3}" << setw(2) << setfill('0') << creation_time->tm_mday << ')'
+ << "(?{4}" << setw(2) << setfill('0') << creation_time->tm_hour << ')'
+ ;
+
+ return boost::regex_replace(orig_filename, filename_replace_re, formatstring.str(), boost::match_default | boost::format_all);
+}
+
+string FileStoreBase::makeFilePath(struct tm* creation_time) {
+ if (!subDirectory.empty()) {
+ return baseFilePath + "/" + substituteFilenameString(subDirectory, creation_time);
+ } else {
+ return baseFilePath;
+ }
+}
+
+
string FileStoreBase::makeFullFilename(int suffix, struct tm* creation_time,
bool use_full_path) {
ostringstream filename;
if (use_full_path) {
- filename << filePath << '/';
+ filename << makeFilePath(creation_time) << '/';
+ } else if (!subDirectory.empty()) {
+ filename << substituteFilenameString(subDirectory, creation_time) << '/';
}
filename << makeBaseFilename(creation_time);
filename << '_' << setw(5) << setfill('0') << suffix;
@@ -433,9 +454,9 @@ string FileStoreBase::makeBaseSymlink() {
return base.str();
}
-string FileStoreBase::makeFullSymlink() {
+string FileStoreBase::makeFullSymlink(struct tm* creation_time) {
ostringstream filename;
- filename << filePath << '/' << makeBaseSymlink();
+ filename << makeFilePath(creation_time) << '/' << makeBaseSymlink();
return filename.str();
}
@@ -453,8 +474,10 @@ string FileStoreBase::makeBaseFilename(struct tm* creation_time) {
}
// returns the suffix of the newest file matching base_filename
-int FileStoreBase::findNewestFile(const string& base_filename) {
+int FileStoreBase::findNewestFile(struct tm* creation_time) {
+ string base_filename = makeFilePath(creation_time);
+ string filePath = makeFilePath(creation_time);
std::vector<std::string> files = FileInterface::list(filePath, fsType);
int max_suffix = -1;
@@ -471,7 +494,10 @@ int FileStoreBase::findNewestFile(const string& base_filename) {
return max_suffix;
}
-int FileStoreBase::findOldestFile(const string& base_filename) {
+int FileStoreBase::findOldestFile(struct tm* creation_time) {
+
+ string base_filename = makeFilePath(creation_time);
+ string filePath = makeFilePath(creation_time);
std::vector<std::string> files = FileInterface::list(filePath, fsType);
@@ -507,12 +533,13 @@ int FileStoreBase::getFileSuffix(const string& filename,
return suffix;
}
-void FileStoreBase::printStats() {
+void FileStoreBase::printStats(struct tm* creation_time) {
if (!writeStats) {
return;
}
- string filename(filePath);
+ string filePath = makeFilePath(creation_time);
+ string filename = filePath;
filename += "/scribe_stats";
boost::shared_ptr<FileInterface> stats_file =
@@ -641,8 +668,10 @@ bool FileStore::openInternal(bool incrementFilename, struct tm* current_time) {
current_time = &timeinfo;
}
+ string filePath = makeFilePath(current_time);
+
try {
- int suffix = findNewestFile(makeBaseFilename(current_time));
+ int suffix = findNewestFile(current_time);
if (incrementFilename) {
++suffix;
@@ -710,7 +739,7 @@ bool FileStore::openInternal(bool incrementFilename, struct tm* current_time) {
/* just make a best effort here, and don't error if it fails */
if (createSymlink && !isBufferFile) {
- string symlinkName = makeFullSymlink();
+ string symlinkName = makeFullSymlink(current_time);
boost::shared_ptr<FileInterface> tmp =
FileInterface::createFileInterface(fsType, symlinkName, isBufferFile);
tmp->deleteFile();
@@ -903,7 +932,7 @@ bool FileStore::writeMessages(boost::shared_ptr<logentry_vector_t> messages,
// currently gets invoked from within a bufferstore
void FileStore::deleteOldest(struct tm* now) {
- int index = findOldestFile(makeBaseFilename(now));
+ int index = findOldestFile(now);
if (index < 0) {
return;
}
@@ -920,7 +949,7 @@ void FileStore::deleteOldest(struct tm* now) {
bool FileStore::replaceOldest(boost::shared_ptr<logentry_vector_t> messages,
struct tm* now) {
string base_name = makeBaseFilename(now);
- int index = findOldestFile(base_name);
+ int index = findOldestFile(now);
if (index < 0) {
LOG_OPER("[%s] Could not find files <%s>", categoryHandled.c_str(), base_name.c_str());
return false;
@@ -957,7 +986,7 @@ bool FileStore::readOldest(/*out*/ boost::shared_ptr<logentry_vector_t> messages
long loss;
- int index = findOldestFile(makeBaseFilename(now));
+ int index = findOldestFile(now);
if (index < 0) {
// This isn't an error. It's legit to call readOldest when there aren't any
// files left, in which case the call succeeds but returns messages empty.
@@ -1015,6 +1044,7 @@ bool FileStore::readOldest(/*out*/ boost::shared_ptr<logentry_vector_t> messages
}
bool FileStore::empty(struct tm* now) {
+ string filePath = makeFilePath(now);
std::vector<std::string> files = FileInterface::list(filePath, fsType);
std::string base_filename = makeBaseFilename(now);
@@ -1135,7 +1165,7 @@ bool ThriftFileStore::openInternal(bool incrementFilename, struct tm* current_ti
}
int suffix;
try {
- suffix = findNewestFile(makeBaseFilename(current_time));
+ suffix = findNewestFile(current_time);
} catch(const std::exception& e) {
LOG_OPER("Exception < %s > in ThriftFileStore::openInternal",
e.what());
@@ -1153,7 +1183,7 @@ bool ThriftFileStore::openInternal(bool incrementFilename, struct tm* current_ti
string filename = makeFullFilename(suffix, current_time);
/* try to create the directory containing the file */
- if (!createFileDirectory()) {
+ if (!createFileDirectory(current_time)) {
LOG_OPER("[%s] Could not create path for file: %s",
categoryHandled.c_str(), filename.c_str());
return false;
@@ -1213,7 +1243,7 @@ bool ThriftFileStore::openInternal(bool incrementFilename, struct tm* current_ti
/* just make a best effort here, and don't error if it fails */
if (createSymlink) {
- string symlinkName = makeFullSymlink();
+ string symlinkName = makeFullSymlink(current_time);
unlink(symlinkName.c_str());
string symtarget = makeFullFilename(suffix, current_time, false);
symlink(symtarget.c_str(), symlinkName.c_str());
@@ -1222,7 +1252,8 @@ bool ThriftFileStore::openInternal(bool incrementFilename, struct tm* current_ti
return true;
}
-bool ThriftFileStore::createFileDirectory () {
+bool ThriftFileStore::createFileDirectory (struct tm* current_time) {
+ string filePath = makeFilePath(current_time);
try {
boost::filesystem::create_directories(filePath);
} catch(const std::exception& e) {
16 src/store.h
View
@@ -129,21 +129,26 @@ class FileStoreBase : public Store {
// appends information about the current file to a log file in the same
// directory
- virtual void printStats();
+ virtual void printStats(struct tm* creation_time);
// Returns the number of bytes to pad to align to the specified block size
unsigned long bytesToPad(unsigned long next_message_length,
unsigned long current_file_size,
unsigned long chunk_size);
+ std::string makeFilePath(struct tm* creation_time);
+
// A full filename includes an absolute path and a sequence number suffix.
std::string makeBaseFilename(struct tm* creation_time);
std::string makeFullFilename(int suffix, struct tm* creation_time,
bool use_full_path = true);
+
+ std::string substituteFilenameString(std::string &orig_filename, struct tm* creation_time);
+
std::string makeBaseSymlink();
- std::string makeFullSymlink();
- int findOldestFile(const std::string& base_filename);
- int findNewestFile(const std::string& base_filename);
+ std::string makeFullSymlink(struct tm* creation_time);
+ int findOldestFile(struct tm* creation_time);
+ int findNewestFile(struct tm* creation_time);
int getFileSuffix(const std::string& filename,
const std::string& base_filename);
void setHostNameSubDir();
@@ -151,7 +156,6 @@ class FileStoreBase : public Store {
// Configuration
std::string baseFilePath;
std::string subDirectory;
- std::string filePath;
std::string baseFileName;
std::string baseSymlinkName;
unsigned long maxSize;
@@ -249,7 +253,7 @@ class ThriftFileStore : public FileStoreBase {
void configure(pStoreConf configuration, pStoreConf parent);
void close();
void flush();
- bool createFileDirectory();
+ bool createFileDirectory(struct tm* current_time);
protected:
// Implement FileStoreBase virtual function
Something went wrong with that request. Please try again.